A Generic and Model-Agnostic Exemplar Synthetization Framework for Explainable AI

With the growing complexity of deep learning methods adopted in practical applications, there is an increasing and stringent need to explain and interpret the decisions of such methods. In this work, we focus on explainable AI and propose a novel generic and model-agnostic framework for synthesizing input exemplars that maximize a desired response from a machine learning model. To this end, we use a generative model, which acts as a prior for generating data, and traverse its latent space using a novel evolutionary strategy with momentum updates. Our framework is generic because (i) it can employ any underlying generator, e.g. Variational Auto-Encoders (VAEs) or Generative Adversarial Networks (GANs), and (ii) it can be applied to any input data, e.g. images, text samples or tabular data. Since we use a zero-order optimization method, our framework is model-agnostic, in the sense that the machine learning model that we aim to explain is a black-box. We stress out that our novel framework does not require access or knowledge of the internal structure or the training data of the black-box model. We conduct experiments with two generative models, VAEs and GANs, and synthesize exemplars for various data formats, image, text and tabular, demonstrating that our framework is generic. We also employ our prototype synthetization framework on various black-box models, for which we only know the input and the output formats, showing that it is model-agnostic. Moreover, we compare our framework (available at this https URL) with a model-dependent approach based on gradient descent, proving that our framework obtains equally-good exemplars in a shorter computational time.

[1]  Radu Tudor Ionescu,et al.  Local Learning With Deep and Handcrafted Features for Facial Expression Recognition , 2018, IEEE Access.

[2]  Cynthia Rudin,et al.  This Looks Like That: Deep Learning for Interpretable Image Recognition , 2018 .

[3]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[5]  Andrea Vedaldi,et al.  Understanding deep image representations by inverting them , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Radu Tudor Ionescu,et al.  Automated essay scoring with string kernels and word embeddings , 2018, ACL.

[7]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Chris Russell,et al.  Explaining Explanations in AI , 2018, FAT.

[9]  Sandeep Subramanian,et al.  Adversarial Generation of Natural Language , 2017, Rep4NLP@ACL.

[10]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[11]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[12]  Samy Bengio,et al.  Generating Sentences from a Continuous Space , 2015, CoNLL.

[13]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Yoshua Bengio,et al.  Challenges in representation learning: A report on three machine learning contests , 2013, Neural Networks.

[15]  Franco Turini,et al.  A Survey of Methods for Explaining Black Box Models , 2018, ACM Comput. Surv..

[16]  Stan Matwin,et al.  Black Box Explanation by Learning Image Exemplars in the Latent Feature Space , 2019, ECML/PKDD.

[17]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[18]  Arvind Narayanan,et al.  Semantics derived automatically from language corpora contain human-like biases , 2016, Science.

[19]  Christopher K. I. Williams,et al.  Robust Variational Autoencoders for Outlier Detection and Repair of Mixed-Type Data , 2020, AISTATS.

[20]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[21]  Eric P. Xing,et al.  Unsupervised Text Style Transfer using Language Models as Discriminators , 2018, NeurIPS.

[22]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[23]  Xi Chen,et al.  Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[24]  Yuval Pinter,et al.  Attention is not not Explanation , 2019, EMNLP.

[25]  R. Tibshirani,et al.  Prototype selection for interpretable classification , 2011, 1202.5933.

[26]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[28]  Eric P. Xing,et al.  Toward Controlled Generation of Text , 2017, ICML.

[29]  Andrea Vedaldi,et al.  Visualizing Deep Convolutional Neural Networks Using Natural Pre-images , 2015, International Journal of Computer Vision.

[30]  Chu-Ren Huang,et al.  Sentiment Classification Considering Negation and Contrast Transition , 2009, PACLIC.

[31]  Xiaodong Liu,et al.  Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing , 2019, NAACL.

[32]  Vineeth N. Balasubramanian,et al.  Grad-CAM++: Generalized Gradient-Based Visual Explanations for Deep Convolutional Networks , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[33]  Seth Flaxman,et al.  European Union Regulations on Algorithmic Decision-Making and a "Right to Explanation" , 2016, AI Mag..

[34]  Fei-Fei Li,et al.  Visualizing and Understanding Recurrent Networks , 2015, ArXiv.

[35]  Yoshua Bengio,et al.  A Recurrent Latent Variable Model for Sequential Data , 2015, NIPS.

[36]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[37]  Xinlei Chen,et al.  Visualizing and Understanding Neural Models in NLP , 2015, NAACL.

[38]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[39]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[40]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.

[41]  Adam Tauman Kalai,et al.  Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings , 2016, NIPS.

[42]  Thomas Brox,et al.  Synthesizing the preferred inputs for neurons in neural networks via deep generator networks , 2016, NIPS.

[43]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[44]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[45]  Pradeep Ravikumar,et al.  Representer Point Selection for Explaining Deep Neural Networks , 2018, NeurIPS.

[46]  Jimeng Sun,et al.  Explainable Prediction of Medical Codes from Clinical Text , 2018, NAACL.

[47]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[48]  Amina Adadi,et al.  Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI) , 2018, IEEE Access.