Thibaut Chataing

DiffVersify: A scalable approach to differentiable pattern mining with coverage regularization

By Thibaut Chataing, Julien Perez, Marc Plantevit, CĂ©line Robardet

2024-01-10

In Machine learning and knowledge discovery in databases. Research track - european conference, ECML PKDD 2024, vilnius, lithuania, september 9-13, 2024, proceedings, part VI

Abstract

Pattern mining addresses the challenge of automatically identifying interpretable and discriminative patterns within data. Recent approaches, leveraging differentiable approach through neural autoencoder with class recovery, have achieved encouraging results but tend to fall short as the magnitude of the noise and the number of underlying features increase in the data. Empirically, one can observe that the number of discovered patterns tend to be limited in these challenging contexts. In this article, we present a differentiable binary model that integrates a new regularization technique to enhance pattern coverage. Besides, we introduce an innovative pattern decoding strategy taking advantage of non-negative matrix factorization (NMF), extending beyond conventional thresholding methods prevalent in existing approaches. Experiments on four realworld datasets exhibit superior performances of DiffVersify in terms of the ROC-AUC metric. On synthetic data, we observe an increase in the similarity between the discovered patterns and the ground truth. Finally, using several metrics to finely evaluate the quality of the patterns in regard to the data, we show the global effectiveness of the approach.

Continue reading