论文信息 - Interpreting Neural Network Judgments via Minimal, Stable, and Symbolic Corrections

Interpreting Neural Network Judgments via Minimal, Stable, and Symbolic Corrections

The paper describes a new algorithm to generate minimal, stable, and symbolic corrections to an input that will cause a neural network with ReLU neurons to change its output. We argue that such a correction is a useful way to provide feedback to a user when the neural network produces an output that is different from a desired output. Our algorithm generates such a correction by solving a series of linear constraint satisfaction problems. The technique is evaluated on a neural network that has been trained to predict whether an applicant will pay a mortgage.

[1] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[3] Scott Lundberg,et al. A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[4] Mark J. F. Gales,et al. Stimulated Deep Neural Network for Speech Recognition , 2016, INTERSPEECH.

[5] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.

[6] Patrick D. McDaniel,et al. Cleverhans V0.1: an Adversarial Machine Learning Library , 2016, ArXiv.

[7] Ronan Collobert,et al. From image-level to pixel-level labeling with Convolutional Networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Wojciech Samek,et al. Methods for interpreting and understanding deep neural networks , 2017, Digit. Signal Process..

[9] Geoffrey E. Hinton. A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[10] Pascal Vincent,et al. Visualizing Higher-Layer Features of a Deep Network , 2009 .

[11] Seyed-Mohsen Moosavi-Dezfooli,et al. DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Percy Liang,et al. Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[13] Cynthia Rudin,et al. Deep Learning for Case-based Reasoning through Prototypes: A Neural Network that Explains its Predictions , 2017, AAAI.

[14] Mark J. F. Gales,et al. Improving the interpretability of deep neural networks with stimulated learning , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[15] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[16] Amit Dhurandhar,et al. Explanations based on the Missing: Towards Contrastive Explanations with Pertinent Negatives , 2018, NeurIPS.

[17] Alexander Binder,et al. On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.

[18] Osbert Bastani,et al. Interpretability via Model Extraction , 2017, ArXiv.

[19] Douglas Eck,et al. A Neural Representation of Sketch Drawings , 2017, ICLR.

[20] Honglak Lee,et al. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[21] Thomas Brox,et al. Synthesizing the preferred inputs for neurons in neural networks via deep generator networks , 2016, NIPS.

[22] Koray Kavukcuoglu,et al. Pixel Recurrent Neural Networks , 2016, ICML.

[23] Carlos Guestrin,et al. "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[24] Regina Barzilay,et al. Rationalizing Neural Predictions , 2016, EMNLP.

[25] Carlos Guestrin,et al. Anchors: High-Precision Model-Agnostic Explanations , 2018, AAAI.

[26] Justin A. Sirignano,et al. Deep Learning for Mortgage Risk , 2016, Journal of Financial Econometrics.

[27] Finale Doshi-Velez,et al. Increasing the Interpretability of Recurrent Neural Networks Using Hidden Markov Models , 2016, ArXiv.

[28] Antonio Criminisi,et al. Measuring Neural Net Robustness with Constraints , 2016, NIPS.

[29] Klaus-Robert Müller,et al. Learning how to explain neural networks: PatternNet and PatternAttribution , 2017, ICLR.