Simple and Effective Stochastic Neural Networks

Stochastic neural networks (SNNs) are currently topical, with several paradigms being actively investigated including dropout, Bayesian neural networks, variational information bottleneck (VIB) and noise regularized learning. These neural network variants impact several major considerations, including generalization, network compression, robustness against adversarial attack and label noise, and model calibration. However, many existing networks are complicated and expensive to train, and/or only address one or two of these practical considerations. In this paper we propose a simple and effective stochastic neural network (SE-SNN) architecture for discriminative learning by directly modeling activation uncertainty and encouraging high activation variability. Compared to existing SNNs, our SE-SNN is simpler to implement and faster to train, and produces state of the art results on network compression by pruning, adversarial defense, learning with label noise, and model calibration.

[1]  Federico Tombari,et al.  Sampling-Free Epistemic Uncertainty Estimation Using Approximated Variance Propagation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  Geoffrey E. Hinton,et al.  When Does Label Smoothing Help? , 2019, NeurIPS.

[3]  Justin Solomon,et al.  Learning Embeddings into Entropic Wasserstein Spaces , 2019, ICLR.

[4]  Gabriel Peyré,et al.  Stochastic Deep Networks , 2018, ICML.

[5]  Cho-Jui Hsieh,et al.  Adv-BNN: Improved Adversarial Defense through Robust Bayesian Neural Network , 2018, ICLR.

[6]  Ping Wang,et al.  Adversarial Noise Layer: Regularize Neural Network by Adding Noise , 2018, 2019 IEEE International Conference on Image Processing (ICIP).

[7]  Yee Whye Teh,et al.  Conditional Neural Processes , 2018, ICML.

[8]  David P. Wipf,et al.  Compressing Neural Networks using the Variational Information Bottleneck , 2018, ICML.

[9]  David Barber,et al.  A Scalable Laplace Approximation for Neural Networks , 2018, ICLR.

[10]  Kevin Gimpel,et al.  Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise , 2018, NeurIPS.

[11]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[12]  Li Fei-Fei,et al.  MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels , 2017, ICML.

[13]  Max Welling,et al.  Learning Sparse Neural Networks through L0 Regularization , 2017, ICLR.

[14]  Cho-Jui Hsieh,et al.  Towards Robust Neural Networks via Random Self-ensemble , 2017, ECCV.

[15]  Lei Zhang,et al.  CleanNet: Transfer Learning for Scalable Image Classifier Training with Label Noise , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Alan L. Yuille,et al.  Mitigating adversarial effects through randomization , 2017, ICLR.

[17]  Zhanxing Zhu,et al.  Bayesian Adversarial Learning , 2018, NeurIPS.

[18]  Bohyung Han,et al.  Regularizing Deep Neural Networks by Noise: Its Interpretation and Optimization , 2017, NIPS.

[19]  Zhiqiang Shen,et al.  Learning Efficient Convolutional Networks through Network Slimming , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[20]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[21]  Max Welling,et al.  Bayesian Compression for Deep Learning , 2017, NIPS.

[22]  Dmitry P. Vetrov,et al.  Structured Bayesian Pruning via Log-Normal Multiplicative Noise , 2017, NIPS.

[23]  Dmitry P. Vetrov,et al.  Variational Dropout Sparsifies Deep Neural Networks , 2017, ICML.

[24]  Alexander A. Alemi,et al.  Deep Variational Information Bottleneck , 2017, ICLR.

[25]  Jacob Goldberger,et al.  Training deep neural-networks using a noise adaptation layer , 2016, ICLR.

[26]  Timo Aila,et al.  Pruning Convolutional Neural Networks for Resource Efficient Inference , 2016, ICLR.

[27]  Richard Nock,et al.  Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[29]  R. Venkatesh Babu,et al.  Generalized Dropout , 2016, ArXiv.

[30]  Yiran Chen,et al.  Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.

[31]  Misha Denil,et al.  Noisy Activation Functions , 2016, ICML.

[32]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[33]  Quoc V. Le,et al.  Adding Gradient Noise Improves Learning for Very Deep Networks , 2015, ArXiv.

[34]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[35]  Milos Hauskrecht,et al.  Obtaining Well Calibrated Probabilities Using Bayesian Binning , 2015, AAAI.

[36]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[37]  Dumitru Erhan,et al.  Training Deep Neural Networks on Noisy Labels with Bootstrapping , 2014, ICLR.

[38]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[39]  Ariel D. Procaccia,et al.  Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.

[40]  Joan Bruna,et al.  Training Convolutional Networks with Noisy Labels , 2014, ICLR 2014.

[41]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[42]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[43]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[44]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[45]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[46]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[47]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[48]  Christopher M. Bishop,et al.  Current address: Microsoft Research, , 2022 .