Reintroducing Straight-Through Estimators as Principled Methods for Stochastic Binary Networks

[1]  Niao He,et al.  On the Convergence Rate of Stochastic Mirror Descent for Nonsmooth Nonconvex Optimization , 2018, 1806.04781.

[2]  David Duvenaud,et al.  Backpropagation through the Void: Optimizing control variates for black-box gradient estimation , 2017, ICLR.

[3]  Mohammad Emtiyaz Khan,et al.  Training Binary Neural Networks using the Bayesian Learning Rule , 2020, ICML.

[4]  Kai Yu,et al.  Binary Deep Neural Networks for Speech Recognition , 2017, INTERSPEECH.

[5]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[6]  Shuchang Zhou,et al.  DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[7]  Stefano Lodi,et al.  Self-Supervised Bernoulli Autoencoders for Semi-Supervised Hashing , 2020, CIARP.

[8]  Efstratios Gavves,et al.  Low Bias Low Variance Gradient Estimates for Boolean Stochastic Networks , 2020, ICML.

[9]  Gang Hua,et al.  How to Train a Compact Binary Neural Network with High Accuracy? , 2017, AAAI.

[10]  Miguel Lázaro-Gredilla,et al.  Local Expectation Gradients for Black Box Variational Inference , 2015, NIPS.

[11]  Ruslan Salakhutdinov,et al.  Learning Stochastic Feedforward Neural Networks , 2013, NIPS.

[12]  Boris Flach,et al.  Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks , 2020, NeurIPS.

[13]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[14]  Chang Liu,et al.  Straight-Through Estimator as Projected Wasserstein Gradient Flow , 2019, ArXiv.

[15]  WEIGHTS HAVING STABLE SIGNS ARE IMPORTANT: FINDING PRIMARY SUBNETWORKS , 2020 .

[16]  Mingyuan Zhou,et al.  ARM: Augment-REINFORCE-Merge Gradient for Stochastic Binary Networks , 2018, ICLR.

[17]  Christoph Meinel,et al.  Back to Simplicity: How to Train Accurate BNNs from Scratch? , 2019, ArXiv.

[18]  Alexander Shekhovtsov,et al.  Bias-Variance Tradeoffs in Single-Sample Binary Gradient Estimators , 2021, GCPR.

[19]  Mark Horowitz,et al.  1.1 Computing's energy problem (and what we can do about it) , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).

[20]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[21]  Jascha Sohl-Dickstein,et al.  REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models , 2017, NIPS.

[22]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[23]  Max Welling,et al.  Probabilistic Binary Neural Networks , 2018, ArXiv.

[24]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[25]  Xianglong Liu,et al.  Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[26]  Yu Bai,et al.  ProxQuant: Quantized Neural Networks via Proximal Operators , 2018, ICLR.

[27]  Holger Fröning,et al.  Training Discrete-Valued Neural Networks with Sign Activations Using Weight Distributions , 2019, ECML/PKDD.

[28]  Philip H. S. Torr,et al.  Mirror Descent View for Neural Network Quantization , 2019, AISTATS.

[29]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[30]  Ethan Fetaya,et al.  Learning Discrete Weights Using the Local Reparameterization Trick , 2017, ICLR.

[31]  Babak Hassibi,et al.  Stochastic Mirror Descent on Overparameterized Nonlinear Models , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[32]  Kwang-Ting Cheng,et al.  Latent Weights Do Not Exist: Rethinking Binarized Neural Network Optimization , 2019, NeurIPS.

[33]  Maja Pantic,et al.  Improved training of binary networks for human pose estimation and image recognition , 2019, ArXiv.

[34]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[35]  Georgios Tzimiropoulos,et al.  High-Capacity Expert Binary Networks , 2020, ICLR.

[36]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[37]  Xiaoning Qian,et al.  Pairwise Supervised Hashing with Bernoulli Variational Auto-Encoder and Self-Control Gradient Estimator , 2020, UAI.

[38]  Andrew S. Cassidy,et al.  Convolutional networks for fast, energy-efficient neuromorphic computing , 2016, Proceedings of the National Academy of Sciences.

[39]  Nicholas D. Lane,et al.  An Empirical study of Binary Neural Networks' Optimisation , 2018, ICLR.

[40]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[41]  M. E. Khan Learning-Algorithms from Bayesian Principles , 2019 .

[42]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[43]  Georgios Tzimiropoulos,et al.  BATS: Binary ArchitecTure Search , 2020, ECCV.

[44]  Alexander Shekhovtsov,et al.  Initialization and Transfer Learning of Stochastic Binary Networks from Real-Valued Ones , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[45]  Lawrence Carin,et al.  GO Gradient for Expectation-Based Objectives , 2019, ICLR.

[46]  Geoffrey E. Hinton,et al.  Using very deep autoencoders for content-based image retrieval , 2011, ESANN.

[47]  Jack Xin,et al.  Understanding Straight-Through Estimator in Training Activation Quantized Neural Nets , 2019, ICLR.

[48]  Le Song,et al.  Stochastic Generative Hashing , 2017, ICML.

[49]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, ArXiv.

[50]  Georgios Tzimiropoulos,et al.  Training Binary Neural Networks with Real-to-Binary Convolutions , 2020, ICLR.

[51]  Wei Pan,et al.  Towards Accurate Binary Convolutional Neural Network , 2017, NIPS.

[52]  Georgios Tzimiropoulos,et al.  Binarized Convolutional Landmark Localizers for Human Pose Estimation and Face Alignment with Limited Resources , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[53]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[54]  Yi Fang,et al.  Variational Deep Semantic Hashing for Text Documents , 2017, SIGIR.

[55]  Radford M. Neal Connectionist Learning of Belief Networks , 1992, Artif. Intell..

[56]  Issei Sato,et al.  Evaluating the Variance of Likelihood-Ratio Gradient Estimators , 2017, ICML.

[57]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[58]  Mark W. Schmidt,et al.  Fast and Simple Natural-Gradient Variational Inference with Mixture of Exponential-family Approximations , 2019, ICML.

[59]  Tapani Raiko,et al.  Techniques for Learning Binary Stochastic Feedforward Neural Networks , 2014, ICLR.

[60]  Sahin Lale,et al.  A Study of Generalization of Stochastic Mirror Descent Algorithms on Overparameterized Nonlinear Models , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[61]  Daan Wierstra,et al.  Deep AutoRegressive Networks , 2013, ICML.

[62]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[63]  Guoyin Wang,et al.  NASH: Toward End-to-End Neural Architecture for Generative Semantic Hashing , 2018, ACL.

[64]  Wei Liu,et al.  Bi-Real Net: Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithm , 2018, ECCV.