Publications

SAT-based learning of computation tree logic

By Adrien Pommellet, Daniel Stan, Simon Scatton

2024-07-01

In Proceedings of the 12th international joint conference (IJCAR’24)

Abstract

The CTL learning problem consists in finding for a given sample of positive and negative Kripke structures a distinguishing CTL formula that is verified by the former but not by the latter. Further constraints may bound the size and shape of the desired formula or even ask for its minimality in terms of syntactic size. This synthesis problem is motivated by explanation generation for dissimilar models, e.g. comparing a faulty implementation with the original protocol. We devise a SAT-based encoding for a fixed size CTL formula, then provide an incremental approach that guarantees minimality. We further report on a prototype implementation whose contribution is twofold: first, it allows us to assess the efficiency of various output fragments and optimizations. Secondly, we can experimentally evaluate this tool by randomly mutating Kripke structures or syntactically introducing errors in higher-level models, then learning CTL distinguishing formulas.

Continue reading

New algorithms for multivalued component trees

Abstract

Les structures arborescentes peuvent modéliser les images—et plus généralement les graphes valués—à des fins de traitement et d’analyse. Dans ce cadre, l’arbre des coupes a été nativement proposé pour les images en niveaux de gris et, plus généralement, pour les graphes valués totalement ordonnés. Il y a dix ans, la notion d’arbre des coupes multivalués a été introduite pour assouplir cette contrainte de niveaux de gris / ordre total. Dans cet article algorithmique, nous fournissons de nouveaux outils pour traiter les arbres des coupes multivalués. Nos contributions sont doubles : (1) nous proposons un nouvel algorithme pour la construction de l’arbre des coupes multivalués ; (2) nous proposons deux stratégies pour construire des ordres hiérarchiques sur des ensembles de valeurs, nécessaires pour construire les arbres des coupes multivalués d’images / de graphes reposant sur de tels ensembles de valeurs.

Continue reading

Translation of semi-extended regular expressions using derivatives

By Antoine Martin, Etienne Renault, Alexandre Duret-Lutz

2024-06-20

In Proceedings of the 28th international conference on implementation and applications of automata (CIAA’24)

Abstract

We generalize Antimirov’s notion of linear form of a regular expression, to the Semi-Extended Regular Expressions typically used in the Property Specification Language or SystemVerilog Assertions. Doing so requires extending the construction to handle more operators, and dealing with expressions over alphabets $\Sigma=2^{AP}$ of valuations of atomic propositions. Using linear forms to construct automata labeled by Boolean expressions suggests heuristics that we evaluate. Finally, we study a variant of this translation to automata with accepting transitions: this construction is more natural and provides smaller automata.

Continue reading

Neural koopman prior for data assimilation

Abstract

With the increasing availability of large scale datasets, computational power and tools like automatic differentiation and expressive neural network architectures, sequential data are now often treated in a data-driven way, with a dynamical model trained from the observation data. While neural networks are often seen as uninterpretable black-box architectures, they can still benefit from physical priors on the data and from mathematical knowledge. In this paper, we use a neural network architecture which leverages the long-known Koopman operator theory to embed dynamical systems in latent spaces where their dynamics can be described linearly, enabling a number of appealing features. We introduce methods that enable to train such a model for long-term continuous reconstruction, even in difficult contexts where the data comes in irregularly-sampled time series. The potential for self-supervised learning is also demonstrated, as we show the promising use of trained dynamical models as priors for variational data assimilation techniques, with applications to e.g. time series interpolation and forecasting.

Continue reading

Transforming gradient-based techniques into interpretable methods

Abstract

The explication of Convolutional Neural Networks (CNN) through xAI techniques often poses challenges in interpretation. The inherent complexity of input features, notably pixels extracted from images, engenders complex correlations. Gradient-based methodologies, exemplified by Integrated Gradients (IG), effectively demonstrate the significance of these features. Nevertheless, the conversion of these explanations into images frequently yields considerable noise. Presently, we introduce GAD (Gradient Artificial Distancing) as a supportive framework for gradient-based techniques. Its primary objective is to accentuate influential regions by establishing distinctions between classes. The essence of GAD is to limit the scope of analysis during visualization and, consequently reduce image noise. Empirical investigations involving occluded images have demonstrated that the identified regions through this methodology indeed play a pivotal role in facilitating class differentiation.

Continue reading

Machine learning-based health environmental-clinical risk scores in european children

Abstract

Early life environmental stressors play an important role in the development of multiple chronic disorders. Previous studies that used environmental risk scores (ERS) to assess the cumulative impact of environmental exposures on health are limited by the diversity of exposures included, especially for early life determinants. We used machine learning methods to build early life exposome risk scores for three health outcomes using environmental, molecular, and clinical data.

Continue reading

Graph-based spectral analysis for detecting cyber attacks

By Majed Jaber, Nicolas Boutry, Pierre Parrend

2024-05-01

In ARES 2024 (the international conference on availability, reliability and security)

Abstract

Spectral graph theory delves into graph properties through their spectral signatures. The eigenvalues of a graph’s Laplacian matrix are crucial for grasping its connectivity and overall structural topology. This research capitalizes on the inherent link between graph topology and spectral characteristics to enhance spectral graph analysis applications. In particular, such connectivity information is key to detect low signals that betray the occurrence of cyberattacks. This paper introduces SpectraTW, a novel spectral graph analysis methodology tailored for monitoring anomalies in network traffic. SpectraTW relies on four spectral indicators, Connectedness, Flooding, Wiriness, and Asymmetry, derived from network attributes and topological variations, that are defined and evaluated. This method interprets networks as evolving graphs, leveraging the Laplacian matrix’s spectral insights to detect shifts in network structure over time. The significance of spectral analysis becomes especially pronounced in the medical IoT domains, where the complex web of devices and the critical nature of healthcare data amplify the need for advanced security measures. Spectral analysis’s ability to swiftly pinpoint irregularities and shift in network traffic aligns well with the medical IoT’s requirements for prompt attack detection.

Continue reading

InaGVAD: A challenging French TV and radio corpus annotated for speech activity detection and speaker gender segmentation

By Doukhan David, Maertens Christine, Le Personnic William, Speroni Ludovic, Dehak Reda

2024-05-01

In Proceedings of the 2024 joint international conference on computational linguistics, language resources and evaluation (LREC-COLING 2024)

Abstract

InaGVAD is an audio corpus collected from 10 French radio and 18 TV channels categorized into 4 groups: generalist radio, music radio, news TV, and generalist TV. It contains 277 1-minute-long annotated recordings aimed at representing the acoustic diversity of French audiovisual programs and was primarily designed to build systems able to monitor men’s and women’s speaking time in media. inaGVAD is provided with Voice Activity Detection (VAD) and Speaker Gender Segmentation (SGS) annotations extended with overlap, speaker traits (gender, age, voice quality), and 10 non-speech event categories. Annotation distributions are detailed for each channel category. This dataset is partitioned into a 1h development and a 3h37 test subset, allowing fair and reproducible system evaluation. A benchmark of 6 freely available VAD software is presented, showing diverse abilities based on channel and non-speech event categories. Two existing SGS systems are evaluated on the corpus and compared against a baseline X-vector transfer learning strategy, trained on the development subset. Results demonstrate that our proposal, trained on a single - but diverse - hour of data, achieved competitive SGS results. The entire inaGVAD package; including corpus, annotations, evaluation scripts, and baseline training code; is made freely accessible, fostering future advancement in the domain.

Continue reading

The Quickref cohort

By Didier Verna

2024-05-01

In ELS 2024, the 17th european lisp symposium

Abstract

The internal architecture of Declt, our reference manual generator for Common Lisp libraries, is currently evolving towards a three-stage pipeline in which the information gathered for documentation purposes is first reified into a formalized set of object-oriented data structures. A side-effect of this evolution is the ability to dump that information for other purposes than documentation. We demonstrate this ability applied to the complete Quicklisp ecosystem. The resulting “cohort” includes more than half a million programmatic definitions, and can be used to gain insight into the morphology of Common Lisp software.

Continue reading

Weakly supervised training for hologram verification in identity documents

By Glen Pouliquen, Guillaume Chiron, Joseph Chazalon, Thierry Géraud, Ahmad Montaser Awal

2024-04-25

In The 18th international conference on document analysis and recognition (ICDAR 2024)

Abstract

We propose a method to remotely verify the authenticity of Optically Variable Devices (OVDs), often referred to as “holograms”, in identity documents. Our method processes video clips captured with smartphones under common lighting conditions, and is evaluated on two public datasets: MIDV-HOLO and MIDV-2020. Thanks to a weakly-supervised training, we optimize a feature extraction and decision pipeline which achieves a new leading performance on MIDV-HOLO, while maintaining a high recall on documents from MIDV-2020 used as attack samples. It is also the first method, to date, to effectively address the photo replacement attack task, and can be trained on either genuine samples, attack samples, or both for increased performance. By enabling to verify OVD shapes and dynamics with very little supervision, this work opens the way towards the use of massive amounts of unlabeled data to build robust remote identity document verification systems on commodity smartphones. Code is available at https://github.com/EPITAResearchLab/pouliquen.24.icdar.

Continue reading