Binarized Neural Networks

We introduce a method to train Binarized Neural Networks (BNNs) - neural networks with binary weights and activations at run-time. At train-time the binary weights and activations are used for computing the parameter gradients. During the forward pass, BNNs drastically reduce memory size and accesses, and replace most arithmetic operations with bit-wise operations, which is expected to substantially improve power-efficiency. To validate the effectiveness of BNNs, we conducted two sets of experiments on the Torch7 and Theano frameworks. On both, BNNs achieved nearly state-of-the-art results over the MNIST, CIFAR-10 and SVHN datasets. We also report our preliminary results on the challenging ImageNet dataset. Last but not least, we wrote a binary matrix multiplication GPU kernel with which it is possible to run our MNIST BNN 7 times faster than with an unoptimized GPU kernel, without suffering any loss in classification accuracy. The code for training and running our BNNs is available on-line.

[1]  Viktor K. Prasanna,et al.  Analysis of high-performance floating-point arithmetic on FPGAs , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[2]  Karl S. Hemmert,et al.  Embedded floating-point units in FPGAs , 2006, FPGA '06.

[3]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[4]  Clément Farabet,et al.  Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.

[5]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[6]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[7]  Razvan Pascanu,et al.  Theano: new features and speed improvements , 2012, ArXiv.

[8]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[9]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons , 2013, ArXiv.

[10]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[11]  Ian J. Goodfellow,et al.  Pylearn2: a machine learning research library , 2013, ArXiv.

[12]  Tao Wang,et al.  Deep learning with COTS HPC systems , 2013, ICML.

[13]  Yoshua Bengio,et al.  Training deep neural networks with low precision multiplications , 2014 .

[14]  Ninghui Sun,et al.  DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[15]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[16]  Parul Parashar,et al.  Neural Networks in Machine Learning , 2014 .

[17]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[18]  Kyuyeon Hwang,et al.  Fixed-point feedforward deep neural network design using weights +1, 0, and −1 , 2014, 2014 IEEE Workshop on Signal Processing Systems (SiPS).

[19]  Benjamin Graham,et al.  Spatially-sparse convolutional neural networks , 2014, ArXiv.

[20]  Ming Yang,et al.  Compressing Deep Convolutional Networks using Vector Quantization , 2014, ArXiv.

[21]  Ron Meir,et al.  Expectation Backpropagation: Parameter-Free Training of Multilayer Neural Networks with Continuous or Discrete Weights , 2014, NIPS.

[22]  Dharmendra S. Modha,et al.  Backpropagation for Energy-Efficient Neuromorphic Computing , 2015, NIPS.

[23]  Wonyong Sung,et al.  Resiliency of Deep Neural Networks under Quantization , 2015, ArXiv.

[24]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[25]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[26]  Carlo Baldassi,et al.  Subdominant Dense Clusters Allow for Simple Learning and High Computational Performance in Neural Networks with Discrete Synapses. , 2015, Physical review letters.

[27]  Colin Raffel,et al.  Lasagne: First release. , 2015 .

[28]  Avinoam Kolodny,et al.  Memristor-Based Multilayer Neural Networks With Online Gradient Descent Training , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[29]  Daniel Soudry,et al.  Training Binary Multilayer Neural Networks for Image Classification using Expectation Backpropagation , 2015, ArXiv.

[30]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[31]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[32]  Natalie D. Enright Jerger,et al.  Reduced-Precision Strategies for Bounded Memory in Deep Neural Nets , 2015, ArXiv.

[33]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Dahua Lin,et al.  Adjustable Bounded Rectifiers: Towards Deep Binary Representations , 2015, ArXiv.

[35]  Mark M. Churchland,et al.  Using Firing-Rate Dynamics to Train Recurrent Networks of Spiking Model Neurons , 2016, 1601.07620.

[36]  Zhuowen Tu,et al.  Generalizing Pooling Functions in Convolutional Neural Networks: Mixed, Gated, and Tree , 2015, AISTATS.

[37]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[38]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[39]  Yoshua Bengio,et al.  Neural Networks with Few Multiplications , 2015, ICLR.

[40]  Paris Smaragdis,et al.  Bitwise Neural Networks , 2016, ArXiv.

[41]  Ran El-Yaniv,et al.  Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..