Publications

An end-to-end approach for the detection of phishing attacks

By Badis Hammi, Tristan Billot, Danyil Bazain, Nicolas Binand, Maxime Jaen, Chems Mitta, Nour El Madhoun

2024-04-01

In Advanced information networking and applications (AINA))

Abstract

The main approaches/implementations used to counteract phishing attacks involve the use of crowd-sourced blacklists. However, blacklists come with several drawbacks. In this paper, we present a comprehensive approach for the detection of phishing attacks. Our approach uses our own detection engine which relies on Graph Neural Networks to leverage the hyperlink structure of the websites to analyze. Additionally, we offer a turnkey implementation to the end-users in the form of a Mozilla Firefox plugin.

Continue reading

Automatic vectorization of historical maps: A benchmark

Abstract

Shape vectorization is a key stage of the digitization of large-scale historical maps, especially city maps that exhibit complex and valuable details. Having access to digitized buildings, building blocks, street networks and other geographic content opens numerous new approaches for historical studies such as change tracking, morphological analysis and density estimations. In the context of the digitization of Paris atlases created in the 19th and early 20th centuries, we have designed a supervised pipeline that reliably extract closed shapes from historical maps. This pipeline is based on a supervised edge filtering stage using deep filters, and a closed shape extraction stage using a watershed transform. It relies on probable multiple suboptimal methodological choices that hamper the vectorization performances in terms of accuracy and completeness. Objectively investigating which solutions are the most adequate among the numerous possibilities is comprehensively addressed in this paper. The following contributions are subsequently introduced: (i) we propose an improved training protocol for map digitization; (ii) we introduce a joint optimization of the edge detection and shape extraction stages; (iii) we compare the performance of state-of-the-art deep edge filters with topology-preserving loss functions, including vision transformers; (iv) we evaluate the end-to-end deep learnable watershed against Meyer watershed. We subsequently design the critical path for a fully automatic extraction of key elements of historical maps. All the data, code, benchmark results are freely available at https://github.com/soduco/Benchmark_historical_map_vectorization.

Continue reading

The reactive synthesis competition (SYNTCOMP): 2018-2021

Abstract

We report on the last four editions of the reactive synthesis competition (SYNTCOMP 2018–2021). We briefly describe the evaluation scheme and the experimental setup of SYNTCOMP. Then we introduce new benchmark classes that have been added to the SYNTCOMP library and give an overview of the participants of SYNTCOMP. Finally, we present and analyze the results of our experimental evaluations, including a ranking of tools with respect to quantity and quality—that is, the total size in terms of logic and memory elements—of solutions.

Continue reading

Unsupervised discovery of interpretable visual concepts

Abstract

Providing interpretability of deep-learning models to non-experts, while fundamental for a responsible real-world usage, is challenging. Attribution maps from xAI techniques, such as Integrated Gradients, are a typical example of a visualization technique containing a high level of information, but with difficult interpretation. In this paper, we propose two methods, Maximum Activation Groups Extraction (MAGE) and Multiscale Interpretable Visualization (Ms-IV), to explain the model’s decision, enhancing global interpretability. MAGE finds, for a given CNN, combinations of features which, globally, form a semantic meaning, that we call concepts. We group these similar feature patterns by clustering in concepts, that we visualize through Ms-IV. This last method is inspired by Occlusion and Sensitivity analysis (incorporating causality) and uses a novel metric, called Class-aware Order Correlation (CAOC), to globally evaluate the most important image regions according to the model’s decision space. We compare our approach to xAI methods such as LIME and Integrated Gradients. Experimental results evince the Ms-IV higher localization and faithfulness values. Finally, qualitative evaluation of combined MAGE and Ms-IV demonstrates humans’ ability to agree, based on the visualization, with the decision of clusters’ concepts; and, to detect, among a given set of networks, the existence of bias.

Continue reading

DiffVersify: A scalable approach to differentiable pattern mining with coverage regularization

By Thibaut Chataing, Julien Perez, Marc Plantevit, Céline Robardet

2024-01-10

In Machine learning and knowledge discovery in databases. Research track - european conference, ECML PKDD 2024, vilnius, lithuania, september 9-13, 2024, proceedings, part VI

Abstract

Pattern mining addresses the challenge of automatically identifying interpretable and discriminative patterns within data. Recent approaches, leveraging differentiable approach through neural autoencoder with class recovery, have achieved encouraging results but tend to fall short as the magnitude of the noise and the number of underlying features increase in the data. Empirically, one can observe that the number of discovered patterns tend to be limited in these challenging contexts. In this article, we present a differentiable binary model that integrates a new regularization technique to enhance pattern coverage. Besides, we introduce an innovative pattern decoding strategy taking advantage of non-negative matrix factorization (NMF), extending beyond conventional thresholding methods prevalent in existing approaches. Experiments on four realworld datasets exhibit superior performances of DiffVersify in terms of the ROC-AUC metric. On synthetic data, we observe an increase in the similarity between the discovered patterns and the ground truth. Finally, using several metrics to finely evaluate the quality of the patterns in regard to the data, we show the global effectiveness of the approach.

Continue reading

Additive margin in contrastive self-supervised frameworks to learn discriminative speaker representations

By Theo Lepage, Reda Dehak

2024-01-01

In The speaker and language recognition workshop (odyssey 2024)

Abstract

Self-Supervised Learning (SSL) frameworks became the standard for learning robust class representations by benefiting from large unlabeled datasets. For Speaker Verification (SV), most SSL systems rely on contrastive-based loss functions. We explore different ways to improve the performance of these techniques by revisiting the NT-Xent contrastive loss. Our main contribution is the definition of the NT-Xent-AM loss and the study of the importance of Additive Margin (AM) in SimCLR and MoCo SSL methods to further separate positive from negative pairs. Despite class collisions, we show that AM enhances the compactness of same-speaker embeddings and reduces the number of false negatives and false positives on SV. Additionally, we demonstrate the effectiveness of the symmetric contrastive loss, which provides more supervision for the SSL task. Implementing these two modifications to SimCLR improves performance and results in 7.85% EER on VoxCeleb1-O, outperforming other equivalent methods.

Continue reading

Apprentissage interprétable de la criminalité en france (2012-2021)

By Nida Meddouri, David Beserra

2024-01-01

In Actes de l’atelier gestion et analyse des données spatiales et temporelles

Abstract

L’activité criminelle en France a connu une évolution significative au cours des deux dernières décennies, marquée par la recrudescence des actes de malveillance, notamment liés aux mouvements sociaux et syndicaux, aux émeutes, ainsi qu’au terrorisme. Dans ce contexte difficile, l’utilisation de techniques issues de l’intelligence artificielle pourrait offrir de nombreuses perspectives pour renforcer la sûreté publique et privée en France. Un exemple de cette approche est l’analyse spatio-temporelle des données de criminalité, déjà couronnée de succès au Brésil (Da Silva et al., 2020), au Proche-Orient (Tolan et al., 2015), et dans d’autres pays. Dans le cadre de ce travail, nous explorons la possibilité d’appliquer cette approche au contexte français.

Continue reading

Concurrent stochastic lossy channel games

By Daniel Stan, Muhammad Najib, Anthony Widjaja Lin, Parosh Aziz Abdulla

2024-01-01

In Proceedings of the 32nd EACSL annual conference on computer science logic (CSL’24), february 19-23, 2024, naples, italy

Abstract

Continue reading

Enhanced neonatal screening for sickle cell disease: Human-guided deep learning with CNN on isoelectric focusing images

By Kpangni Alex Jérémie Koua, Cheikh Talibouya Diop, Lamine Diop, Mamadou Diop

2024-01-01

In Journal of Infrastructure, Policy and Development

Abstract

Accurate detection of abnormal hemoglobin variations is paramount for early diagnosis of sickle cell disease (SCD) in newborns. Traditional methods using isoelectric focusing (IEF) with agarose gels are technician-dependent and face limitations like inconsistent image quality and interpretation challenges. This study proposes a groundbreaking solution using deep learning (DL) and artificial intelligence (AI) while ensuring human guidance throughout the process. The system analyzes IEF gel images with convolutional neural networks (CNNs), achieving over 98% accuracy in identifying various SCD profiles, far surpassing the limitations of traditional methods. Furthermore, the system addresses ambiguities by incorporating an “Unconfirmed” category for unclear cases and assigns probability values to each classification, empowering clinicians with crucial information for informed decisions. This AI-powered tool, named SCScreen, seamlessly integrates machine learning with medical expertise, offering a robust, efficient, and accurate solution for SCD screening. Notably, SCScreen tackles the previously challenging diagnosis of major sickle cell syndromes (SDM) in newborns. This research has the potential to revolutionize SCD management. By strengthening screening platforms and potentially reducing costs, SCScreen paves the way for improved healthcare outcomes for newborns with SCD, potentially saving lives and improving the quality of life for affected individuals.

Continue reading