Publications

Towards attack detection in traffic data based on spectral graph analysis

Abstract

Nowadays, cyberattacks have become a significant concern for individuals, organizations, and governments. These attacks can take many forms, and the consequences can be severe. In order to protect ourselves from these threats, it is essential to employ a range of different strategies and techniques like detection of patterns, classification of system behaviors against previously known attacks, and anomaly detection techniques. This way, we can identify unknown forms of attacks. Few of these existing techniques seem to fully utilize the potential of mathematical approaches such as spectral graph analysis. This domain is made of tools able to extract important topological features of a graph by computing its Laplacian matrix and its corresponding spectrum. This framework can provide valuable insights into the underlying structure of a network, which can be used to detect cyberthreats. Indeed, significant changes in the topology of the graph result in significant changes in the spectrum of the Laplacian matrix. For this reason, we propose here to address this issue by considering the network as a dynamic graph composed of nodes (devices) and edges (requests between devices), to study the evolution of the Laplacian spectrum, and to compute metrics on this evolving spectrum. This way, we should be able to detect suspicious behaviors which may indicate that an attack is occurring.

Continue reading

A systemic mapping of methods and tools for performance analysis of data streaming with containerized microservices architecture

By S. Ris, Jean Araujo, David Beserra

2023-03-29

In 18th iberian conference on information systems and technologies (CISTI’2023)

Abstract

With the Internet of Things (IoT) growth and customer expectations, the importance of data streaming and streaming processing has increased. Data Streaming refers to the concept where data is processed and transmitted continuously and in real-time without necessarily being stored in a physical location. Personal health monitors and home security systems are examples of data streaming sources. This paper presents a systematic mapping study of the performance analysis of Data Streaming systems in the context of Containerization and Microservices. The research aimed to identify the main methods, tools, and techniques used in the last five years for the execution of this type of study. The results show that there are still few performance evaluation studies for this system niche, and there are gaps that must be filled, such as the lack of analytical modeling and the disregard for communication protocols’ influence.

Continue reading

Modern vectorization and alignement of historical maps: An application to paris atlas (1789-1950)

Abstract

Maps have been a unique source of knowledge for centuries. Such historical documents provide invaluable information for analyzing complex spatial transformations over important time frames. This is particularly true for urban areas that encompass multiple interleaved research domains: humanities, social sciences, etc. The large amount and significant diversity of map sources call for automatic image processing techniques in order to extract the relevant objects as vector features. The complexity of maps (text, noise, digitization artifacts, etc.) has hindered the capacity of proposing a versatile and efficient raster-to-vector approaches for decades. In this thesis, we propose a learnable, reproducible, and reusable solution for the automatic transformation of raster maps into vector objects (building blocks, streets, rivers), focusing on the extraction of closed shapes. Our approach is built upon the complementary strengths of convolutional neural networks which excel at filtering edges while presenting poor topological properties for their outputs, and mathematical morphology, which offers solid guarantees regarding closed shape extraction while being very sensitive to noise. In order to improve the robustness of deep edge filters to noise, we review several, and propose new topology-preserving loss functions which enable to improve the topological properties of the results. We also introduce a new contrast convolution (CConv) layer to investigate how architectural changes can impact such properties. Finally, we investigate the different approaches which can be used to implement each stage, and how to combine them in the most efficient way. Thanks to a shape extraction pipeline, we propose a new alignment procedure for historical map images, and start to leverage the redundancies contained in map sheets with similar contents to propagate annotations, improve vectorization quality, and eventually detect evolution patterns for later analysis or to automatically assess vectorization quality. To evaluate the performance of all methods mentioned above, we released a new dataset of annotated historical map images. It is the first public and open dataset targeting the task of historical map vectorization. We hope that thanks to our publications, public and open releases of datasets, codes and results, our work will benefit a wide range of historical map-related applications.

Continue reading

A Myhill-Nerode theorem for higher-dimensional automata

By Uli Fahrenberg, Krzysztof Ziemiański

2023-03-05

In Proceedings of the 44th international conference on application and theory of petri nets and concurrency (PN’23)

Abstract

We establish a Myhill-Nerode type theorem for higher-dimensional automata (HDAs), stating that a language is regular precisely if it has finite prefix quotient. HDAs extend standard automata with additional structure, making it possible to distinguish between interleavings and concurrency. We also introduce deterministic HDAs and show that not all HDAs are determinizable, that is, there exist regular languages that cannot be recognised by a deterministic HDA. Using our theorem, we develop an internal characterisation of deterministic languages.

Continue reading

Catoids and modal convolution algebras

Abstract

We show how modal quantales arise as convolution algebras $Q^X$ of functions from catoids $X$, that is, multisemigroups with a source map $\ell$ and a target map $r$, into modal quantales $Q$, which can be seen as weight or value algebras. In the tradition of boolean algebras with operators we study modal correspondences between algebraic laws in $X$, $Q$ and $Q^X$. The class of catoids we introduce generalises Schweizer and Sklar’s function systems and object-free categories to a setting isomorphic to algebras of ternary relations, as they are used for boolean algebras with operators and substructural logics. Our results provide a generic construction of weighted modal quantales from such multisemigroups. It is illustrated by many examples. We also discuss how these results generalise to a setting that supports reasoning with stochastic matrices or probabilistic predicate transformers.

Continue reading

Why is the winner the best?

By Matthias Eisenmann, Annika Reinke, Vivienn Weru, Minu D. Tizabi, Fabian Isensee, Tim J. Adler, Sharib Ali, Vincent Andrearczyk, Marc Aubreville, Ujjwal Baid, Spyridon Bakas, Niranjan Balu, Sophia Bano, Jorge Bernal, Sebastian Bodenstedt, Alessandro Casella, Veronika Cheplygina, Marie Daum, Marleen de Bruijne, Adrien Depeursinge, Reuben Dorent, Jan Egger, David G. Ellis, Sandy Engelhardt, Melanie Ganz, Noha Ghatwary, Gabriel Girard, Patrick Godau, Anubha Gupta, Lasse Hansen, Kanako Harada, Mattias P. Heinrich, Nicholas Heller, Alessa Hering, Arnaud Huaulmé, Pierre Jannin, Ali Emre Kavur, Oldřich Kodym, Michal Kozubek, Jianning Li, Hongwei Li, Jun Ma, Carlos Martı́n-Isla, Bjoern Menze, Alison Noble, Valentin Oreiller, Nicolas Padoy, Sarthak Pati, Kelly Payette, Tim Rädsch, Jonathan Rafael-Patiño, Vivek Singh Bawa, Stefanie Speidel, Carole H. Sudre, Kimberlin van Wijnen, Martin Wagner, Donglai Wei, Amine Yamlahi, Moi Hoon Yap, Chun Yuan, Maximilian Zenk, Aneeq Zia, David Zimmerer, Dogu Baran Aydogan, Binod Bhattarai, Louise Bloch, Raphael Brüngel, Jihoon Cho, Chanyeol Choi, Qi Dou, Ivan Ezhov, Christoph M. Friedrich, Clifton D. Fuller, Rebati Raman Gaire, Adrian Galdran, Álvaro Garcı́a Faura, Maria Grammatikopoulou, SeulGi Hong, Mostafa Jahanifar, Ikbeom Jang, Abdolrahim Kadkhodamohammadi, Inha Kang, Florian Kofler, Satoshi Kondo, Hugo Kuijf, Mingxing Li, Minh Luu, Tomaž Martinčič, Pedro Morais, Mohamed A. Naser, Bruno Oliveira, David Owen, Subeen Pang, Jinah Park, Sung-Hong Park, Szymon Plotka, Élodie Puybareau, Nasir Rajpoot, Kanghyun Ryu, Numan Saeed, Adam Shephard, Pengcheng Shi, Dejan Štepec, Ronast Subedi, Guillaume Tochon, Helena R. Torres, Helene Urien, João L. Vilaça, Kareem A. Wahid, Haojie Wang, Jiacheng Wang, Liansheng Wang, Xiyue Wang, Benedikt Wiestler, Marek Wodzinski, Fangfang Xia, Juanying Xie, Zhiwei Xiong, Sen Yang, Yanwu Yang, Zixuan Zhao, Klaus Maier-Hein, Paul F. Jäger, Annette Kopp-Schneider, Lena Maier-Hein

2023-02-27

In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)

Abstract

International benchmarking competitions have become fundamental for the comparative performance assessment of image analysis methods. However, little attention has been given to investigating what can be learnt from these competitions. Do they really generate scientific progress? What are common and successful participation strategies? What makes a solution superior to a competing method? To address this gap in the literature, we performed a multi-center study with all 80 competitions that were conducted in the scope of IEEE ISBI 2021 and MICCAI 2021. Statistical analyses performed based on comprehensive descriptions of the submitted algorithms linked to their rank as well as the underlying participation strategies revealed common characteristics of winning solutions. These typically include the use of multi-task learning (63%) and/or multi-stage pipelines (61%), and a focus on augmentation (100%), image preprocessing (97%), data curation (79%), and postprocessing (66%). The “typical” lead of a winning team is a computer scientist with a doctoral degree, five years of experience in biomedical image analysis, and four years of experience in deep learning. Two core general development strategies stood out for highly-ranked teams: the reflection of the metrics in the method design and the focus on analyzing and handling failure cases. According to the organizers, 43% of the winning algorithms exceeded the state of the art but only 11% completely solved the respective domain problem. The insights of our study could help researchers (1) improve algorithm development strategies when approaching new problems, and (2) focus on open research questions revealed by this work.

Continue reading

Leveraging neural koopman operators to learn continuous representations of dynamical systems from scarce data

By Anthony Frion, Lucas Drumetz, Mauro Dalla Mura, Guillaume Tochon, Abdeldjalil Aïssa-El-Bey

2023-02-17

In Proceedings of the 48th IEEE international conference on acoustics, speech, and signal processing (ICASSP)

Abstract

Over the last few years, several works have proposed deep learning architectures to learn dynamical systems from observation data with no or little knowledge of the underlying physics. A line of work relies on learning representations where the dynamics of the underlying phenomenon can be described by a linear operator, based on the Koopman operator theory. However, despite being able to provide reliable long-term predictions for some dynamical systems in ideal situations, the methods proposed so far have limitations, such as requiring to discretize intrinsically continuous dynamical systems, leading to data loss, especially when handling incomplete or sparsely sampled data. Here, we propose a new deep Koopman framework that represents dynamics in an intrinsically continuous way, leading to better performance on limited training data, as exemplified on several datasets arising from dynamical systems.

Continue reading

A benchmark for toxic comment classification on civil comments dataset

By Corentin Duchêne, Henri Jamet, Pierre Guillaume, Réda Dehak

2023-01-16

In Extraction et gestion des connaissances, EGC 2023, lyon, france, 16 au 20 janvier 2023

Abstract

Continue reading

Electricity price forecasting based on order books: A differentiable optimization approach

By Léonard Tschora, Tias Guns, Erwan Pierre, Marc Plantevit, Céline Robardet

2023-01-10

In Proceedings of the 10th IEEE international conference on data science and advanced analytics, (DSAA’23)

Abstract

We consider day-ahead electricity price forecasting on the European market. In this market, participants can offer electricity for sale or purchase for a specific price by submitting overnight orders. Market operators determine the market clearing price – the price at which the amount of electricity supplied equals the amount of electricity demanded – using the Euphemia balancing algorithm. euphemia is a quadratic optimization problem that maximizes the social welfare defined as the sum of the supplier surplus and consumer surplus while ensuring a null energy balance. This mechanism deeply influences the price calculation, but has so far been little considered in electricity price forecasting algorithms. Existing models are generally based on identifying relationships between exogenous characteristics (consumption and production forecasts) and the market clearing price to be predicted. A few studies have examined the euphemia mechanism during prediction, by doing costly manual transformations on order books. In this article, we overcome this limitation by considering the pricing mechanism during model training. For this, we use a predict-and-optimize strategy with differentiable optimization. We design a fully differentiable and scalable solving method for the euphemia optimization problem and apply it on real-life data from the European Power Exchange (EPEX). We design different model architectures using our differentiable solver and empirically study the impact of taking into account the optimal calculation of prices within the training of the neural network.

Continue reading

Peripheral nervous system responses to food stimuli: Analysis using data science approaches

By Maelle Moranges, Marc Plantevit, Moustafa Bensafi

2023-01-05

In Basic protocols on emotions, senses, and foods

Abstract

In the field of food, as in other fields, the measurement of emotional responses to food and their sensory properties is a major challenge. In the present protocol, we propose a step-by-step procedure that allows a physiological description of odors, aromas, and their hedonic properties. The method rooted in subgroup discovery belongs to the field of data science and especially data mining. It is still little used in the field of food and is based on a descriptive modeling of emotions on the basis of human physiological responses.

Continue reading