Evaluating Adversarial Robustness for Deep Neural Network Interpretability using fMRI Decoding

While deep neural networks (DNNs) are being increasingly used to make predictions from high-dimensional, complex data, they are widely seen as uninterpretable "black boxes", since it can be difficult to discover what input information is used to make predictions. This ability is particularly important for applications in cognitive neuroscience and neuroinformatics. A saliency map is a common approach for producing interpretable visualizations of the relative importance of input features for a prediction. However, many methods for creating these maps fail due to focusing too much on the input or being extremely sensitive to small input noise. It is also challenging to quantitatively evaluate how well saliency maps correspond to the truly relevant input information. In this paper, we develop two quantitative evaluation procedures for saliency methods, using the fact that the Human Connectome Project (HCP) dataset contains functional magnetic resonance imaging (fMRI) data from multiple tasks per subject to create ground truth saliency maps. We then introduce an adversarial training method that makes DNNs robust to small input noise, and demonstrate that it measurably improves interpretability.

[1]  Carola-Bibiane Schönlieb,et al.  On the Connection Between Adversarial Robustness and Saliency Map Interpretability , 2019, ICML.

[2]  Marcel van Gerven,et al.  Explainable Deep Learning: A Field Guide for the Uninitiated , 2020, ArXiv.

[3]  Rainer Goebel,et al.  Information-based functional brain mapping. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[5]  Dumitru Erhan,et al.  The (Un)reliability of saliency methods , 2017, Explainable AI.

[6]  Tom M. Mitchell,et al.  Machine learning classifiers and fMRI: A tutorial overview , 2009, NeuroImage.

[7]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[8]  William D. Marslen-Wilson,et al.  The Cambridge Centre for Ageing and Neuroscience (Cam-CAN) study protocol: a cross-sectional, lifespan, multidisciplinary examination of healthy cognitive ageing , 2014, BMC Neurology.

[9]  Beomsu Kim,et al.  Bridging Adversarial Robustness and Gradient Interpretability , 2019, ArXiv.

[10]  D Wager Tor,et al.  NeuroSynth: a new platform for large-scale automated synthesis of human functional neuroimaging data , 2011 .

[11]  Aleksander Madry,et al.  Robustness May Be at Odds with Accuracy , 2018, ICLR.

[12]  Nikolaus Kriegeskorte,et al.  Deep Neural Networks in Computational Neuroscience , 2019 .

[13]  Nikolaus Kriegeskorte,et al.  Interpreting encoding and decoding models , 2018, Current Opinion in Neurobiology.

[14]  Feng Wu,et al.  Task state decoding and mapping of individual four-dimensional fMRI time series using deep neural network , 2018 .

[15]  Nikolaus Kriegeskorte,et al.  Deep Supervised, but Not Unsupervised, Models May Explain IT Cortical Representation , 2014, PLoS Comput. Biol..

[16]  Vincent Frouin,et al.  The Brainomics/Localizer database , 2017, NeuroImage.

[17]  Martin Wattenberg,et al.  SmoothGrad: removing noise by adding noise , 2017, ArXiv.

[18]  Been Kim,et al.  Sanity Checks for Saliency Maps , 2018, NeurIPS.

[19]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[20]  Thomas E. Nichols,et al.  Learning Neural Representations of Human Cognition across Many fMRI Studies , 2017 .

[21]  Zening Fu,et al.  Hype versus hope: Deep learning encodes more predictive and robust brain imaging representations than standard machine learning , 2020, bioRxiv.

[22]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[23]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[24]  Essa Yacoub,et al.  The WU-Minn Human Connectome Project: An overview , 2013, NeuroImage.

[25]  Wojciech Samek,et al.  Analyzing Neuroimaging Data Through Recurrent Deep Learning Models , 2018, Front. Neurosci..

[26]  Abraham Z. Snyder,et al.  Function in the human connectome: Task-fMRI and individual differences in behavior , 2013, NeuroImage.

[27]  Soheil Feizi,et al.  Input-Cell Attention Reduces Vanishing Saliency of Recurrent Neural Networks , 2019, NeurIPS.

[28]  Mert R. Sabuncu,et al.  Machine learning in resting-state fMRI analysis , 2018, Magnetic resonance imaging.

[29]  Larry S. Davis,et al.  Adversarial Training for Free! , 2019, NeurIPS.

[30]  B. Thirion,et al.  Fast reproducible identification and large-scale databasing of individual functional cognitive networks , 2007, BMC Neuroscience.