Publications

ICDAR 2024 competition on historical map text detection, recognition, and linking

Abstract

Text on digitized historical maps contains valuable information, e.g., providing georeferenced political and cultural context. The goal of the ICDAR 2024 MapText Competition is to benchmark methods that automatically extract textual content on historical maps (e.g., place names) and connect words to form location phrases. The competition features two primary tasks—text detection and end-to-end text recognition—each with a secondary task of linking words into phrase blocks. Submissions are evaluated on two data sets: 1) David Rumsey Historical Map Collection which contains 936 map images covering 80 regions and 183 distinct publication years (from 1623 to 2012); 2) French Land Registers (created during the 19th century) which contains 145 map images of 50 French cities and towns. The competition received 44 submissions among all tasks. This report presents the motivation for the competition, the tasks, the evaluation metrics, and the submission analysis.

Continue reading

Using exceptional attributed subgraph mining to explore interindividual variability in odor pleasantness processing in the piriform cortex and amygdala

Abstract

In humans, the amygdala and piriform cortex are 2 important brain structures involved in hedonic odor processing. Although the affective processing of odors in these 2 structures has been extensively studied in the past, the way in which each tested individual contributes to the observed global pattern remains little understood at this stage. The purpose of this study is to examine whether exceptional pattern extraction techniques can improve our understanding of hedonic odor processing in these brain areas while paying particular attention to individual variability. A total of 42 volunteers participated in a functional magnetic resonance imaging (fMRI) study in which they were asked to smell 6 odors and describe their hedonic valence. Classical univariate analyses (statistical parametric mapping) and data mining were performed on the fMRI data. The results from both analyses showed that unpleasant odors preferentially activate the anterior part of the left piriform cortex. Moreover, the data mining approach revealed specific patterns for pleasant and unpleasant odors in the piriform cortex but also in the amygdala. The approach also revealed the contribution of each of the 42 individuals to the observed patterns. Taken together, these results suggest that the data mining approach can be used—with standard fMRI analyses—to provide complementary information regarding spatial location and the contribution of individuals to the observed patterns.

Continue reading

Similarity problems in paragraph justification: An extension to the Knuth-Plass algorithm

By Didier Verna

2024-08-01

In Proceedings of the ACM symposium on document engineering 2024

Abstract

In high quality typography, consecutive lines beginning or ending with the same word or sequence of characters is considered a defect. We have implemented an extension to TeX’s paragraph justification algorithm which handles this problem. Experimentation shows that getting rid of similarities is both worth addressing and achievable. Our extension automates the detection and avoidance of similarities while leaving the ultimate decision to the professional typographer, thanks to a new adjustable cursor. The extension is simple and lightweight, making it a useful addition to production engines.

Continue reading

Combining physical and network data for attack detection in water distribution networks

By Côme Frappé–Vialatoux, Pierre Parrend

2024-07-01

In Water distribution systems analysis (WDSA)/computing and control water industry (CCWI) joint conference

Abstract

Water distribution infrastructures are increasingly incorporating IoT in the form of sensing and computing power to improve control over the system and achieve a greater adaptability to the water demand. This evolution, from physical towards cyberphysical systems, comes with an attack perimeter extended to the cyberspace. Being able to detect this novel kind of attacks is gaining traction in the scientific community. However, machine learning detection algorithms, which are showing encouraging results in cybersecurity applications, needs training data as close as possible to real world data in order to perform well in production environment. The availability of such data, with complexity levels on par with real world infrastructures, with acquisitions from both from physical and cyber spaces, is a bottleneck for the development of machine learning algorithms. This paper addresses this problem by providing an analysis of the currently available cyberphysical datasets in the water distribution field, together with a multi-layer comparison methodology to assess their complexity. This multi-layer approach to complexity evaluation of datasets is based on three major axes, namely attack scenarios, network topology and network communications, allowing for a precise look at the forces and weaknesses of available datasets across a wide spectrum. The results show that currently available datasets are emphasizing on one aspect of real world complexity but lacks on the others, highlighting the need for a more global approach in further work to propose datasets with an increased complexity on multiple aspects at the same time.

Continue reading

Navigating pharmacophore space to identify activity discontinuities: A case study with BCR-ABL

Abstract

The exploration of chemical space is a fundamental aspect of chemoinformatics, particularly when one explores a large compound data set to relate chemical structures with molecular properties. In this study, we extend our previous work on chemical space visualization at the pharmacophoric level. Instead of using conventional binary classification of affinity (active vs inactive), we introduce a refined approach that categorizes compounds into four distinct classes based on their activity levels: super active, very active, active, and inactive. This classification enriches the color scheme applied to pharmacophore space, where the color representation of a pharmacophore hypothesis is driven by the associated compounds. Using the BCR-ABL tyrosine kinase as a case study, we identified intriguing regions corresponding to pharmacophore activity discontinuities, providing valuable insights for structure-activity relationships analysis.

Continue reading

SAT-based learning of computation tree logic

By Adrien Pommellet, Daniel Stan, Simon Scatton

2024-07-01

In Proceedings of the 12th international joint conference (IJCAR’24)

Abstract

The CTL learning problem consists in finding for a given sample of positive and negative Kripke structures a distinguishing CTL formula that is verified by the former but not by the latter. Further constraints may bound the size and shape of the desired formula or even ask for its minimality in terms of syntactic size. This synthesis problem is motivated by explanation generation for dissimilar models, e.g. comparing a faulty implementation with the original protocol. We devise a SAT-based encoding for a fixed size CTL formula, then provide an incremental approach that guarantees minimality. We further report on a prototype implementation whose contribution is twofold: first, it allows us to assess the efficiency of various output fragments and optimizations. Secondly, we can experimentally evaluate this tool by randomly mutating Kripke structures or syntactically introducing errors in higher-level models, then learning CTL distinguishing formulas.

Continue reading

New algorithms for multivalued component trees

Abstract

The component tree (CT)can model grey-level images for various image processing / analysis purposes (filtering, segmentation, registration, retrieval…). Its generalized version, the multivalued component tree (MCT) can model images with hierarchically organized values. We provide new tools to handle MCTs:a new algorithm for the construction of MCTs;two strategies for building hierarchical orders on values, required to further build MCTs.

Continue reading

Translation of semi-extended regular expressions using derivatives

By Antoine Martin, Etienne Renault, Alexandre Duret-Lutz

2024-06-20

In Proceedings of the 28th international conference on implementation and applications of automata (CIAA’24)

Abstract

We generalize Antimirov’s notion of linear form of a regular expression, to the Semi-Extended Regular Expressions typically used in the Property Specification Language or SystemVerilog Assertions. Doing so requires extending the construction to handle more operators, and dealing with expressions over alphabets $\Sigma=2^{AP}$ of valuations of atomic propositions. Using linear forms to construct automata labeled by Boolean expressions suggests heuristics that we evaluate. Finally, we study a variant of this translation to automata with accepting transitions: this construction is more natural and provides smaller automata.

Continue reading

Neural koopman prior for data assimilation

Abstract

With the increasing availability of large scale datasets, computational power and tools like automatic differentiation and expressive neural network architectures, sequential data are now often treated in a data-driven way, with a dynamical model trained from the observation data. While neural networks are often seen as uninterpretable black-box architectures, they can still benefit from physical priors on the data and from mathematical knowledge. In this paper, we use a neural network architecture which leverages the long-known Koopman operator theory to embed dynamical systems in latent spaces where their dynamics can be described linearly, enabling a number of appealing features. We introduce methods that enable to train such a model for long-term continuous reconstruction, even in difficult contexts where the data comes in irregularly-sampled time series. The potential for self-supervised learning is also demonstrated, as we show the promising use of trained dynamical models as priors for variational data assimilation techniques, with applications to e.g. time series interpolation and forecasting.

Continue reading