Publications

Similarity problems in paragraph justification: An extension to the Knuth-Plass algorithm

By Didier Verna

2024-08-01

In Proceedings of the ACM symposium on document engineering 2024

Abstract

In high quality typography, consecutive lines beginning or ending with the same word or sequence of characters is considered a defect. We have implemented an extension to TeX’s paragraph justification algorithm which handles this problem. Experimentation shows that getting rid of similarities is both worth addressing and achievable. Our extension automates the detection and avoidance of similarities while leaving the ultimate decision to the professional typographer, thanks to a new adjustable cursor. The extension is simple and lightweight, making it a useful addition to production engines.

Continue reading

Combining physical and network data for attack detection in water distribution networks

By Côme Frappé–Vialatoux, Pierre Parrend

2024-07-01

In Water distribution systems analysis (WDSA)/computing and control water industry (CCWI) joint conference

Abstract

Water distribution infrastructures are increasingly incorporating IoT in the form of sensing and computing power to improve control over the system and achieve a greater adaptability to the water demand. This evolution, from physical towards cyberphysical systems, comes with an attack perimeter extended to the cyberspace. Being able to detect this novel kind of attacks is gaining traction in the scientific community. However, machine learning detection algorithms, which are showing encouraging results in cybersecurity applications, needs training data as close as possible to real world data in order to perform well in production environment. The availability of such data, with complexity levels on par with real world infrastructures, with acquisitions from both from physical and cyber spaces, is a bottleneck for the development of machine learning algorithms. This paper addresses this problem by providing an analysis of the currently available cyberphysical datasets in the water distribution field, together with a multi-layer comparison methodology to assess their complexity. This multi-layer approach to complexity evaluation of datasets is based on three major axes, namely attack scenarios, network topology and network communications, allowing for a precise look at the forces and weaknesses of available datasets across a wide spectrum. The results show that currently available datasets are emphasizing on one aspect of real world complexity but lacks on the others, highlighting the need for a more global approach in further work to propose datasets with an increased complexity on multiple aspects at the same time.

Continue reading

Navigating pharmacophore space to identify activity discontinuities: A case study with BCR-ABL

Abstract

The exploration of chemical space is a fundamental aspect of chemoinformatics, particularly when one explores a large compound data set to relate chemical structures with molecular properties. In this study, we extend our previous work on chemical space visualization at the pharmacophoric level. Instead of using conventional binary classification of affinity (active vs inactive), we introduce a refined approach that categorizes compounds into four distinct classes based on their activity levels: super active, very active, active, and inactive. This classification enriches the color scheme applied to pharmacophore space, where the color representation of a pharmacophore hypothesis is driven by the associated compounds. Using the BCR-ABL tyrosine kinase as a case study, we identified intriguing regions corresponding to pharmacophore activity discontinuities, providing valuable insights for structure-activity relationships analysis.

Continue reading

SAT-based learning of computation tree logic

By Adrien Pommellet, Daniel Stan, Simon Scatton

2024-07-01

In Proceedings of the 12th international joint conference (IJCAR’24)

Abstract

The CTL learning problem consists in finding for a given sample of positive and negative Kripke structures a distinguishing CTL formula that is verified by the former but not by the latter. Further constraints may bound the size and shape of the desired formula or even ask for its minimality in terms of syntactic size. This synthesis problem is motivated by explanation generation for dissimilar models, e.g. comparing a faulty implementation with the original protocol. We devise a SAT-based encoding for a fixed size CTL formula, then provide an incremental approach that guarantees minimality. We further report on a prototype implementation whose contribution is twofold: first, it allows us to assess the efficiency of various output fragments and optimizations. Secondly, we can experimentally evaluate this tool by randomly mutating Kripke structures or syntactically introducing errors in higher-level models, then learning CTL distinguishing formulas.

Continue reading

New algorithms for multivalued component trees

Abstract

The component tree (CT)can model grey-level images for various image processing / analysis purposes (filtering, segmentation, registration, retrieval…). Its generalized version, the multivalued component tree (MCT) can model images with hierarchically organized values. We provide new tools to handle MCTs:a new algorithm for the construction of MCTs;two strategies for building hierarchical orders on values, required to further build MCTs.

Continue reading

Translation of semi-extended regular expressions using derivatives

By Antoine Martin, Etienne Renault, Alexandre Duret-Lutz

2024-06-20

In Proceedings of the 28th international conference on implementation and applications of automata (CIAA’24)

Abstract

We generalize Antimirov’s notion of linear form of a regular expression, to the Semi-Extended Regular Expressions typically used in the Property Specification Language or SystemVerilog Assertions. Doing so requires extending the construction to handle more operators, and dealing with expressions over alphabets $\Sigma=2^{AP}$ of valuations of atomic propositions. Using linear forms to construct automata labeled by Boolean expressions suggests heuristics that we evaluate. Finally, we study a variant of this translation to automata with accepting transitions: this construction is more natural and provides smaller automata.

Continue reading

Neural koopman prior for data assimilation

Abstract

With the increasing availability of large scale datasets, computational power and tools like automatic differentiation and expressive neural network architectures, sequential data are now often treated in a data-driven way, with a dynamical model trained from the observation data. While neural networks are often seen as uninterpretable black-box architectures, they can still benefit from physical priors on the data and from mathematical knowledge. In this paper, we use a neural network architecture which leverages the long-known Koopman operator theory to embed dynamical systems in latent spaces where their dynamics can be described linearly, enabling a number of appealing features. We introduce methods that enable to train such a model for long-term continuous reconstruction, even in difficult contexts where the data comes in irregularly-sampled time series. The potential for self-supervised learning is also demonstrated, as we show the promising use of trained dynamical models as priors for variational data assimilation techniques, with applications to e.g. time series interpolation and forecasting.

Continue reading

Transforming gradient-based techniques into interpretable methods

Abstract

The explication of Convolutional Neural Networks (CNN) through xAI techniques often poses challenges in interpretation. The inherent complexity of input features, notably pixels extracted from images, engenders complex correlations. Gradient-based methodologies, exemplified by Integrated Gradients (IG), effectively demonstrate the significance of these features. Nevertheless, the conversion of these explanations into images frequently yields considerable noise. Presently, we introduce GAD (Gradient Artificial Distancing) as a supportive framework for gradient-based techniques. Its primary objective is to accentuate influential regions by establishing distinctions between classes. The essence of GAD is to limit the scope of analysis during visualization and, consequently reduce image noise. Empirical investigations involving occluded images have demonstrated that the identified regions through this methodology indeed play a pivotal role in facilitating class differentiation.

Continue reading

Machine learning-based health environmental-clinical risk scores in european children

Abstract

Early life environmental stressors play an important role in the development of multiple chronic disorders. Previous studies that used environmental risk scores (ERS) to assess the cumulative impact of environmental exposures on health are limited by the diversity of exposures included, especially for early life determinants. We used machine learning methods to build early life exposome risk scores for three health outcomes using environmental, molecular, and clinical data.

Continue reading

Graph-based spectral analysis for detecting cyber attacks

By Majed Jaber, Nicolas Boutry, Pierre Parrend

2024-05-01

In ARES 2024 (the international conference on availability, reliability and security)

Abstract

Spectral graph theory delves into graph properties through their spectral signatures. The eigenvalues of a graph’s Laplacian matrix are crucial for grasping its connectivity and overall structural topology. This research capitalizes on the inherent link between graph topology and spectral characteristics to enhance spectral graph analysis applications. In particular, such connectivity information is key to detect low signals that betray the occurrence of cyberattacks. This paper introduces SpectraTW, a novel spectral graph analysis methodology tailored for monitoring anomalies in network traffic. SpectraTW relies on four spectral indicators, Connectedness, Flooding, Wiriness, and Asymmetry, derived from network attributes and topological variations, that are defined and evaluated. This method interprets networks as evolving graphs, leveraging the Laplacian matrix’s spectral insights to detect shifts in network structure over time. The significance of spectral analysis becomes especially pronounced in the medical IoT domains, where the complex web of devices and the critical nature of healthcare data amplify the need for advanced security measures. Spectral analysis’s ability to swiftly pinpoint irregularities and shift in network traffic aligns well with the medical IoT’s requirements for prompt attack detection.

Continue reading