Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks

In networks with binary activations and or binary weights the training by gradient descent is complicated as the model has piecewise constant response. We consider stochastic binary networks, obtained by adding noises in front of activations. The expected model response becomes a smooth function of parameters, its gradient is well defined but is challenging to estimate accurately. We propose a new method for this estimation problem combining sampling and analytic approximation steps. The method has a significantly reduced variance at the price of a small bias which gives a very practical tradeoff in comparison with existing unbiased and biased estimators. We further show that one extra linearization step leads to a deep straight-through estimator previously known only as an ad-hoc heuristic. We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models with both proposed methods.

[1]  Hong Wang,et al.  Loihi: A Neuromorphic Manycore Processor with On-Chip Learning , 2018, IEEE Micro.

[2]  Issei Sato,et al.  Evaluating the Variance of Likelihood-Ratio Gradient Estimators , 2017, ICML.

[3]  Jack Xin,et al.  Understanding Straight-Through Estimator in Training Activation Quantized Neural Nets , 2019, ICLR.

[4]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[5]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[6]  Karol Gregor,et al.  Neural Variational Inference and Learning in Belief Networks , 2014, ICML.

[7]  Gang Hua,et al.  How to Train a Compact Binary Neural Network with High Accuracy? , 2017, AAAI.

[8]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[9]  Boris Flach,et al.  Feed-forward Propagation in Probabilistic Neural Networks with Categorical and Max Layers , 2018, ICLR.

[10]  Shuchang Zhou,et al.  DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[11]  David Duvenaud,et al.  Backpropagation through the Void: Optimizing control variates for black-box gradient estimation , 2017, ICLR.

[12]  Radford M. Neal Connectionist Learning of Belief Networks , 1992, Artif. Intell..

[13]  Mingyuan Zhou,et al.  ARM: Augment-REINFORCE-Merge Gradient for Stochastic Binary Networks , 2018, ICLR.

[14]  Miguel Lázaro-Gredilla,et al.  Local Expectation Gradients for Black Box Variational Inference , 2015, NIPS.

[15]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[16]  Christoph Meinel,et al.  Training Competitive Binary Neural Networks from Scratch , 2018, ArXiv.

[17]  John P. Hayes,et al.  Energy-efficient hybrid stochastic-binary neural networks for near-sensor computing , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[18]  Holger Fröning,et al.  Training Discrete-Valued Neural Networks with Sign Activations Using Weight Distributions , 2019, ECML/PKDD.

[19]  Kiyoung Choi,et al.  Dynamic energy-accuracy trade-off using stochastic computing in deep neural networks , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[20]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[21]  Chang Liu,et al.  Straight-Through Estimator as Projected Wasserstein Gradient Flow , 2019, ArXiv.

[22]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[23]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.

[24]  Lawrence Carin,et al.  GO Gradient for Expectation-Based Objectives , 2019, ICLR.

[25]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, ArXiv.

[26]  Sergey Levine,et al.  MuProp: Unbiased Backpropagation for Stochastic Neural Networks , 2015, ICLR.

[27]  Ruslan Salakhutdinov,et al.  Learning Stochastic Feedforward Neural Networks , 2013, NIPS.

[28]  Tapani Raiko,et al.  Techniques for Learning Binary Stochastic Feedforward Neural Networks , 2014, ICLR.

[29]  Andrew S. Cassidy,et al.  TrueNorth: A High-Performance, Low-Power Neurosynaptic Processor for Multi-Sensory Perception, Action, and Cognition , 2016 .

[30]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[31]  Boris Flach,et al.  Normalization of Neural Networks using Analytic Variance Propagation , 2018, ArXiv.

[32]  Stefan Roth,et al.  Lightweight Probabilistic Deep Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Max Welling,et al.  Probabilistic Binary Neural Networks , 2018, ArXiv.

[34]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[35]  Jascha Sohl-Dickstein,et al.  REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models , 2017, NIPS.