Publications

Structural and spectral analysis of dynamic graphs for attack detection

By Majed Jaber, Nicolas Boutry, Pierre Parrend

2023-07-01

In Rencontre des jeunes chercheurs en inteligence artificielle (RJCIA-2023)

Abstract

At this time, cyberattacks represent a constant threat. Many approaches exist for detecting suspicious behaviors, but very few of them seem to benefit from the huge potential of mathematical approaches like spectral graph analysis, known to be able to extract topological features of a graph using its Laplacian spectrum. For this reason, we consider our network as a dynamic graph composed of nodes (representing the devices) and of edges (representing the requests), and we compute its Laplacian spectrum across time. An important change of topology inducing an important change in the spectrum, this spectrum seems to be the key to detect threats. Dynamic spectrum-based metrics have been developed for this aim.

Continue reading

Structural analysis of the additive noise impact on the $\alpha$-tree

By Baptiste Esteban, Guillaume Tochon, Edwin Carlinet, Didier Verna

2023-06-30

In Proceedings of the 20th international conference on computer analysis of images and patterns (CAIP)

Abstract

Hierarchical representations are very convenient tools when working with images. Among them, the $\alpha$-tree is the basis of several powerful hierarchies used for various applications such as image simplifi- cation, object detection, or segmentation. However, it has been demon- strated that these tasks are very sensitive to the noise corrupting the image. While the quality of some $\alpha$-tree applications has been studied, including some with noisy images, the noise impact on the whole struc- ture has been little investigated. Thus, in this paper, we examine the structure of $\alpha$-trees built on images corrupted by some noise with re- spect to the noise level. We compare its effects on constant and natural images, with different kinds of content, and we demonstrate the relation between the noise level and the distribution of every $\alpha$-tree node depth. Furthermore, we extend this study to the node persistence under a given energy criterion, and we propose a novel energy definition that allows assessing the robustness of a region to the noise.

Continue reading

Adaptive test recommendation for mastery learning

By Nassim Bouarour, Idir Benouaret, Cédric D’Ham, Sihem Amer-Yahia

2023-06-12

In Proceedings of the 2nd international workshop on data systems education: Bridging education practice with education research

Abstract

We tackle the problem of recommending tests to learners to achieve upskilling. Our work is grounded in two learning theories: mastery learning, an instructional strategy that guides learners by providing them tests of increasing difficulty, reviewing their test results, and iterating until they reach a level of mastery; Flow Theory, which identifies different test zones, frustration, learnable, flow and boredom zones, to determine the best k tests to recommend to a learner. We formalize the AdUp Problem and develop a multi-objective optimization solution that adapts the difficulty of recommended tests to the learner’s predicted performance, aptitude, and skill gap. We leverage existing models to simulate learner behavior and run experiments to demonstrate that our formalization is best to attain skill mastery. We discuss open research directions including the applicability of reinforcement learning and the recommendation of peers in collaborative projects.

Continue reading

A benchmark of nested named entity recognition approaches in historical structured documents

By Solenn Tual, Nathalie Abadie, Joseph Chazalon, Bertrand Duménieu, Edwin Carlinet

2023-06-01

In Proceedings of the international conference on document analysis and recognition (ICDAR 2023)

Abstract

Named Entity Recognition (NER) is a key step in the creation of structured data from digitised historical documents. Traditional NER approaches deal with flat named entities, whereas entities are often nested. For example, a postal address might contain a street name and a number. This work compares three nested NER approaches, including two state-of-the-art approaches using Transformer-based architectures. We introduce a new Transformer-based approach based on joint labelling and semantic weighting of errors, evaluated on a collection of 19th-century Paris trade directories. We evaluate approaches regarding the impact of supervised fine-tuning, unsupervised pre-training with noisy texts, and variation of IOB tagging formats. Our results show that while nested NER approaches enable extracting structured data directly, they do not benefit from the extra knowledge provided during training and reach a performance similar to the base approach on flat entities. Even though all 3 approaches perform well in terms of F1-scores, joint labelling is most suitable for hierarchically structured data. Finally, our experiments reveal the superiority of the IO tagging format on such data.

Continue reading

Clustering en chémoinformatique pour le raffinement de l’activité des molécules

By Maroua Lejmi, Ilef Ben Slima, Bertrand Cuissart, Nida Meddouri, Ronan Bureau, Alban Lepailleur, Jean-Luc Lamotte, Amel Borgi

2023-06-01

In Proceedings of the second computer science UTM PhD symposium

Abstract

Dans le domaine de la conception des médicaments, la chémoinformatique utilise des méthodes informatiques et mathématiques pour analyser des données chimiques et biologiques et essayer de trouver très en amont des molécules intéressantes. Dans notre contexte, nous transformons les molécules pour ne conserver que leurs caractéristiques pharmacophoriques (partie active de la molécule). L’objectif de ce travail est de raffiner l’activité des molécules qui seront utilisées dans le processus de conception des médicaments en des classes d’activité. Cela permettra aux chimistes et pharmaciens une meilleure visualisation et compréhension de l’activité des molécules, et fournira des données plus fines pour le développement ultérieur d’un modèle de prédiction des molécules d’interêt therapeutique.

Continue reading

Could the topology of virtual processors affect the performance of a BSD-family OS running in a VM?

By David Beserra, Marc Espie, Jean Araujo, Léo Tomasimo, Hector Poncins, Hadrien-Samrek Lacombe, Thomas Vondracek

2023-06-01

In 18th iberian conference on information systems and technologies (CISTI’2023)

Abstract

Virtual machines are an essential technology in distributed and pervasive systems. One of its configurable parameters is the topology of the virtual processing system, which can potentially impact its performance. In this work, we verify how different virtual processing topologies affect the performance of VMs running BSD OSes. We conclude that in some types of application the topology does not affect the VM performance, while in others it does, and that the performance impact also depends on the OS adopted by the VM.

Continue reading

CRACS: Compaction of rules in anticipatory classifier systems

By Romain Orhand, Pierre Collet, Pierre Parrend, Anne Jeannin-Girardon

2023-06-01

In Proceedings of the companion conference on genetic and evolutionary computation

Abstract

Rule Compaction of populations of Learning Classifier Systems (LCS) has always been a topic of interest to get more insights into the discovered underlying patterns from the data or to remove useless classifiers from the populations. However, these techniques have neither been used nor adapted to Anticipatory Learning Classifier Systems (ALCS). ALCS differ from other LCS in that they build models of their environments from which decision policies to solve their learning tasks are learned. We thus propose CRACS (Compaction of Rules in Anticipatory Classifier Systems), a compaction algorithm for ALCS that aims to reduce the size of their environmental models without impairing these models or the ability of these systems to solve their tasks. CRACS relies on filters applied to classifiers and subsumption principles. The capabilities of our compaction algorithm have been studied with three different ALCS on a thorough benchmark of 23 mazes of various levels of environmental uncertainty. The results show that CRACS reduces the size of populations of classifiers while the learned models of environments and the ability of ALCS to solve their tasks are preserved.

Continue reading

Explorer les débats parlementaires français de la troisième république par leurs sujets

By Marie Puren, Aurélien Pellet

2023-06-01

In Humanistica 2023

Abstract

Cet article compare trois méthodes pour explorer de grands corpus de documents historiques par leurs sujets. Nous travaillons ici sur les débats parlementaires franais de la Troisième République, qui se prêtent particulièrement bien à ce type d’analyse. Après avoir présenté le contexte de cette étude, nous exposons les résultats obtenus avec trois méthodes issues du traitement automatique des langues et appliquées sur des textes publiés entre 1876 et 1914 : l’allocation de Dirichlet latente, les plongements de mots et le Transfer Learning.

Continue reading

L’identification des projets de logiciel libre accessibles aux nouveaux contributeurs

By Paul Hervot, Benoît Crespin

2023-06-01

In EIAH2023 : 11ème conférence sur les environnements informatiques pour l’apprentissage humain

Abstract

FOSS makes an increasing amount of the public and industrial software landscape, notably for its transparency and democratic governance. However, simply publishing the source code of a software does not automatically make it accessible, and many barriers impede new contributors approaching these projects. Through a large-scale software mining of the Software Heritage archive, we test the pertinence of three signals in the identification of accessible FOSS projects for new contributors. Our results show a positive correlation between the number of new contributors of a project successfully bringing their contribution to completion and the presence of contributing guidelines, as well as between that same number and the number of recent unique contributors in the project. Such signals could find a use in the teaching of FOSS practices, helping teachers to select accessible projects for their students.

Continue reading