Biologically inspired protection of deep networks from adversarial attacks

Inspired by biophysical principles underlying nonlinear dendritic computation in neural circuits, we develop a scheme to train deep neural networks to make them robust to adversarial attacks. Our scheme generates highly nonlinear, saturated neural networks that achieve state of the art performance on gradient based adversarial examples on MNIST, despite never being exposed to adversarially chosen examples during training. Moreover, these networks exhibit unprecedented robustness to targeted, iterative schemes for generating adversarial examples, including second-order methods. We further identify principles governing how these networks achieve their robustness, drawing on methods from information geometry. We find these networks progressively create highly flat and compressed internal representations that are sensitive to very few input dimensions, while still solving the task. Moreover, they employ highly kurtotic weight distributions, also found in the brain, and we demonstrate how such kurtosis can protect even linear classifiers from adversarial attack.

[1]  Pascal Vincent,et al.  Contractive Auto-Encoders: Explicit Invariance During Feature Extraction , 2011, ICML.

[2]  Vivien A. Casagrande,et al.  Biophysics of Computation: Information Processing in Single Neurons , 1999 .

[3]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[5]  M. London,et al.  Dendritic computation. , 2005, Annual review of neuroscience.

[6]  Shin Ishii,et al.  Distributional Smoothing with Virtual Adversarial Training , 2015, ICLR 2016.

[7]  Nikolaus Kriegeskorte,et al.  Representational Similarity Analysis – Connecting the Branches of Systems Neuroscience , 2008, Frontiers in systems neuroscience.

[8]  Surya Ganguli,et al.  Deep Learning Models of the Retinal Response to Natural Scenes , 2017, NIPS.

[9]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[10]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[11]  Tara N. Sainath,et al.  FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .

[12]  Ha Hong,et al.  Performance-optimized hierarchical models predict neural responses in higher visual cortex , 2014, Proceedings of the National Academy of Sciences.

[13]  Luca Rigazio,et al.  Towards Deep Neural Network Architectures Robust to Adversarial Examples , 2014, ICLR.

[14]  Leonidas J. Guibas,et al.  Deep Knowledge Tracing , 2015, NIPS.

[15]  Nikolaus Kriegeskorte,et al.  Deep neural networks: a new framework for modelling biological vision and brain information processing , 2015, bioRxiv.

[16]  Surya Ganguli,et al.  Exponential expressivity in deep neural networks through transient chaos , 2016, NIPS.

[17]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[18]  Yann LeCun,et al.  Saturating Auto-Encoders , 2013, ICLR 2013.

[19]  Matthias Bethge,et al.  Comment on "Biologically inspired protection of deep networks from adversarial attacks" , 2017, ArXiv.

[20]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .

[21]  G. Buzsáki,et al.  The log-dynamic brain: how skewed distributions affect network operations , 2014, Nature Reviews Neuroscience.

[22]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).