Machine learning explainability via microaggregation and shallow decision trees

Abstract Artificial intelligence (AI) is being deployed in missions that are increasingly critical for human life. To build trust in AI and avoid an algorithm-based authoritarian society, automated decisions should be explainable. This is not only a right of citizens, enshrined for example in the European General Data Protection Regulation, but a desirable goal for engineers, who want to know whether the decision algorithms are capturing the relevant features. For explainability to be scalable, it should be possible to derive explanations in a systematic way. A common approach is to use simpler, more intuitive decision algorithms to build a surrogate model of the black-box model (for example a deep learning algorithm) used to make a decision. Yet, there is a risk that the surrogate model is too large for it to be really comprehensible to humans. We focus on explaining black-box models by using decision trees of limited depth as a surrogate model. Specifically, we propose an approach based on microaggregation to achieve a trade-off between the comprehensibility and the representativeness of the surrogate model on the one side and the privacy of the subjects used for training the black-box model on the other side.

[1]  David Sánchez,et al.  A Review on Semantic Similarity , 2015 .

[2]  Josep Domingo-Ferrer,et al.  Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation , 2005, Data Mining and Knowledge Discovery.

[3]  Vicenç Torra,et al.  Towards Semantic Microaggregation of Categorical Data for Confidential Documents , 2010, MDAI.

[4]  Agustí Verde Parera,et al.  General data protection regulation , 2018 .

[5]  Saeed Jalili,et al.  Multivariate microaggregation by iterative optimization , 2013, Applied Intelligence.

[6]  Josep Domingo-Ferrer,et al.  Anonymization of nominal data based on semantic marginality , 2013, Inf. Sci..

[7]  Hamido Fujita,et al.  Low-rank local tangent space embedding for subspace clustering , 2020, Inf. Sci..

[8]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[9]  Didier Stricker,et al.  Introducing a New Benchmarked Dataset for Activity Monitoring , 2012, 2012 16th International Symposium on Wearable Computers.

[10]  David Sánchez,et al.  Ontology-based semantic similarity: A new feature-based approach , 2012, Expert Syst. Appl..

[11]  Abdelmalik Taleb-Ahmed,et al.  Learning multi-view deep and shallow features through new discriminative subspace for bi-subject and tri-subject kinship verification , 2019, Applied Intelligence.

[12]  Carlos Guestrin,et al.  Anchors: High-Precision Model-Agnostic Explanations , 2018, AAAI.

[13]  Josep Domingo-Ferrer,et al.  Efficient Near-Optimal Variable-Size Microaggregation , 2019, MDAI.

[14]  Erik Strumbelj,et al.  An Efficient Explanation of Individual Classifications using Game Theory , 2010, J. Mach. Learn. Res..

[15]  Ryan Turner,et al.  A model explanation system , 2016, 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP).

[16]  Naser Damer,et al.  Unsupervised privacy-enhancement of face representations using similarity-sensitive noise transformations , 2019, Applied Intelligence.

[17]  Josep Domingo-Ferrer,et al.  Machine Learning Explainability Through Comprehensible Decision Trees , 2019, CD-MAKE.

[18]  Tim Miller,et al.  Explanation in Artificial Intelligence: Insights from the Social Sciences , 2017, Artif. Intell..

[19]  Josep Domingo-Ferrer,et al.  On the complexity of optimal microaggregation for statistical disclosure control , 2001 .

[20]  Georg Langs,et al.  Causability and explainability of artificial intelligence in medicine , 2019, WIREs Data Mining Knowl. Discov..

[21]  Carlos Guestrin,et al.  Programs as Black-Box Explanations , 2016, ArXiv.

[22]  Josep Domingo-Ferrer,et al.  A polynomial-time approximation to optimal multivariate microaggregation , 2008, Comput. Math. Appl..

[23]  Josep Domingo-Ferrer,et al.  Practical Data-Oriented Microaggregation for Statistical Disclosure Control , 2002, IEEE Trans. Knowl. Data Eng..

[24]  Franco Turini,et al.  A Survey of Methods for Explaining Black Box Models , 2018, ACM Comput. Surv..

[25]  Vicenç Torra Towards Knowledge Intensive Data Privacy , 2010, DPM/SETOP.