Abstract
Explainable AI seeks to unveil the intricacies of black box models through post-hoc strategies or self-interpretable models. In this paper, we tackle the problem of building layers that are intrinsically explainable through logic rules. In particular, we address current state-of-the-art methods’ lack of fidelity and expressivity by introducing a transparent explainable logic layer (TELL). We propose to constrain a feed-forward layer with positive weights, which, combined with particular activation functions, offer the possibility of a direct translation into logic rules. Additionally, this approach overcomes the limitations of previous models, linked to their applicability to binary data only, by proposing a new way to automatically threshold real values and incorporate the obtained predicates into logic rules. We show that, compared to state-of-the-art, TELL achieves similar classification performances and, at the same time, provides higher explanatory power, measured by the agreement between models’ outputs and the activation of the logic explanations. In addition, TELL offers a broader spectrum of applications thanks to the possibility of its use on real data.