Inherent Weight Normalization in Stochastic Neural Networks

Multiplicative stochasticity such as Dropout improves the robustness and generalizability of deep neural networks. Here, we further demonstrate that always-on multiplicative stochasticity combined with simple threshold neurons are sufficient operations for deep neural networks. We call such models Neural Sampling Machines (NSM). We find that the probability of activation of the NSM exhibits a self-normalizing property that mirrors Weight Normalization, a previously studied mechanism that fulfills many of the features of Batch Normalization in an online fashion. The normalization of activities during training speeds up convergence by preventing internal covariate shift caused by changes in the input distribution. The always-on stochasticity of the NSM confers the following advantages: the network is identical in the inference and learning phases, making the NSM suitable for online learning, it can exploit stochasticity inherent to a physical substrate such as analog non-volatile memories for in-memory computing, and it is suitable for Monte Carlo sampling, while requiring almost exclusively addition and comparison operations. We demonstrate NSMs on standard classification benchmarks (MNIST and CIFAR) and event-based classification benchmarks (N-MNIST and DVS Gestures). Our results show that NSMs perform comparably or better than conventional artificial neural networks with the same architecture.

[1]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[2]  Somnath Paul,et al.  Event-Driven Random Back-Propagation: Enabling Neuromorphic Deep Learning Machines , 2016, Front. Neurosci..

[3]  Siddharth Joshi,et al.  Stochastic Synapses Enable Efficient Brain-Inspired Learning Machines , 2015, Front. Neurosci..

[4]  Tobi Delbrück,et al.  A Low Power, Fully Event-Based Gesture Recognition System , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[6]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[7]  Jason Tyler Rolfe,et al.  Discrete Variational Autoencoders , 2016, ICLR.

[8]  Alessandro Calderoni,et al.  Statistical Fluctuations in HfOx Resistive-Switching Memory: Part I - Set/Reset Variability , 2014, IEEE Transactions on Electron Devices.

[9]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[10]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[11]  D. Attwell,et al.  Synaptic Energy Use and Supply , 2012, Neuron.

[12]  Mark D. McDonnell,et al.  The benefits of noise in neural systems: bridging theory and experiment , 2011, Nature Reviews Neuroscience.

[13]  B. Walmsley,et al.  The probabilistic nature of synaptic transmission at a mammalian excitatory central synapse , 1987, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[14]  Gert Cauwenberghs,et al.  Memristors Empower Spiking Neurons With Stochasticity , 2015, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[15]  Shimeng Yu,et al.  Ferroelectric FET analog synapse for acceleration of deep neural network training , 2017, 2017 IEEE International Electron Devices Meeting (IEDM).

[16]  Damien Querlioz,et al.  Bioinspired Programming of Memory Devices for Implementing an Inference Engine , 2015, Proceedings of the IEEE.

[17]  M. Carandini,et al.  Normalization as a canonical neural computation , 2011, Nature Reviews Neuroscience.

[18]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[19]  Gert Cauwenberghs,et al.  Event-driven contrastive divergence for spiking neuromorphic systems , 2013, Front. Neurosci..

[20]  Renjie Liao,et al.  Normalizing the Normalizers: Comparing and Extending Network Normalization Schemes , 2016, ICLR.

[21]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[22]  Gregory Cohen,et al.  Converting Static Image Datasets to Spiking Neuromorphic Datasets Using Saccades , 2015, Front. Neurosci..

[23]  S. Ambrogio,et al.  Statistical Fluctuations in HfOx Resistive-Switching Memory: Part II—Random Telegraph Noise , 2014, IEEE Transactions on Electron Devices.

[24]  Yoshua Bengio,et al.  Difference Target Propagation , 2014, ECML/PKDD.

[25]  Marcin Andrychowicz,et al.  Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[26]  G. Cauwenberghs,et al.  Memristor-based neural networks: Synaptic versus neuronal stochasticity , 2016 .

[27]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[28]  Ethan Fetaya,et al.  Learning Discrete Weights Using the Local Reparameterization Trick , 2017, ICLR.

[29]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.

[30]  Shih-Chii Liu,et al.  Neuromorphic sensory systems , 2010, Current Opinion in Neurobiology.

[31]  Sergey Levine,et al.  MuProp: Unbiased Backpropagation for Stochastic Neural Networks , 2015, ICLR.

[32]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[33]  Gregory Cohen,et al.  EMNIST: an extension of MNIST to handwritten letters , 2017, CVPR 2017.

[34]  Wolfgang Maass,et al.  Neural Dynamics as Sampling: A Model for Stochastic Computation in Recurrent Networks of Spiking Neurons , 2011, PLoS Comput. Biol..

[35]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[36]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[37]  T. Branco,et al.  The probability of neurotransmitter release: variability and feedback control at single synapses , 2009, Nature Reviews Neuroscience.

[38]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[39]  Rubén Moreno-Bote,et al.  Poisson-Like Spiking in Circuits with Probabilistic Synapses , 2014, PLoS Comput. Biol..

[40]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[41]  William B Levy,et al.  Energy-Efficient Neuronal Computation via Quantal Synaptic Failures , 2002, The Journal of Neuroscience.

[42]  Tim Salimans,et al.  Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.

[43]  R. Salakhutdinov,et al.  A New Learning Algorithm for Stochastic Feedforward Neural Nets , 2013 .

[44]  Chung Lam,et al.  Training a Probabilistic Graphical Model With Resistive Switching Electronic Synapses , 2016, IEEE Transactions on Electron Devices.

[45]  Zhiwei Li,et al.  Binary neural network with 16 Mb RRAM macro chip for classification and online training , 2016, 2016 IEEE International Electron Devices Meeting (IEDM).

[46]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[47]  Sepp Hochreiter,et al.  Self-Normalizing Neural Networks , 2017, NIPS.

[48]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[49]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[50]  Yoshua Bengio,et al.  BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1 , 2016, ArXiv.

[51]  Tapani Raiko,et al.  Techniques for Learning Binary Stochastic Feedforward Neural Networks , 2014, ICLR.

[52]  Stefan Slesazeck,et al.  Random Number Generation Based on Ferroelectric Switching , 2018, IEEE Electron Device Letters.