论文信息 - GXNOR-Net: Training deep neural networks with ternary weights and activations without full-precision memory under a unified discretization framework

GXNOR-Net: Training deep neural networks with ternary weights and activations without full-precision memory under a unified discretization framework

Although deep neural networks (DNNs) are being a revolutionary power to open up the AI era, the notoriously huge hardware overhead has challenged their applications. Recently, several binary and ternary networks, in which the costly multiply-accumulate operations can be replaced by accumulations or even binary logic operations, make the on-chip training of DNNs quite promising. Therefore there is a pressing need to build an architecture that could subsume these networks under a unified framework that achieves both higher performance and less overhead. To this end, two fundamental issues are yet to be addressed. The first one is how to implement the back propagation when neuronal activations are discrete. The second one is how to remove the full-precision hidden weights in the training phase to break the bottlenecks of memory/computation consumption. To address the first issue, we present a multi-step neuronal activation discretization method and a derivative approximation technique that enable the implementing the back propagation algorithm on discrete DNNs. While for the second issue, we propose a discrete state transition (DST) methodology to constrain the weights in a discrete space without saving the hidden weights. Through this way, we build a unified framework that subsumes the binary or ternary networks as its special cases, and under which a heuristic algorithm is provided at the website https://github.com/AcrossV/Gated-XNOR. More particularly, we find that when both the weights and activations become ternary values, the DNNs can be reduced to sparse binary networks, termed as gated XNOR networks (GXNOR-Nets) since only the event of non-zero weight and non-zero activation enables the control gate to start the XNOR logic operations in the original binary networks. This promises the event-driven hardware design for efficient mobile intelligence. We achieve advanced performance compared with state-of-the-art algorithms. Furthermore, the computational sparsity and the number of states in the discrete space can be flexibly modified to make it suitable for various hardware platforms.

[1] Chin-Hui Lee,et al. A unified approach to transfer learning of deep neural networks with applications to speaker adaptation in automatic speech recognition , 2016, Neurocomputing.

[2] Yoshua Bengio,et al. BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[3] John B. Shoven,et al. I , Edinburgh Medical and Surgical Journal.

[4] Andreas Knoblauch,et al. Efficient Associative Computation with Discrete Synapses , 2016, Neural Computation.

[5] Sen Jia,et al. Convolutional neural networks for hyperspectral image classification , 2017, Neurocomputing.

[6] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.

[7] Yichuan Tang,et al. Deep Learning using Linear Support Vector Machines , 2013, 1306.0239.

[8] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[9] Günther Palm,et al. Memory Capacities for Synaptic and Structural Plasticity G ¨ Unther Palm , 2022 .

[10] Steve B. Furber,et al. The SpiNNaker Project , 2014, Proceedings of the IEEE.

[11] KnoblauchAndreas,et al. Memory capacities for synaptic and structural plasticity , 2010 .

[12] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[13] Richard M. Schwartz,et al. Fast and Robust Neural Network Joint Models for Statistical Machine Translation , 2014, ACL.

[14] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[15] Armand Joulin,et al. Deep Fragment Embeddings for Bidirectional Image Sentence Mapping , 2014, NIPS.

[16] Song Han,et al. Trained Ternary Quantization , 2016, ICLR.

[17] Chi-Ying Tsui,et al. LRADNN: High-throughput and energy-efficient Deep Neural Network accelerator using Low Rank Approximation , 2016, 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC).

[18] Ali Farhadi,et al. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[19] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Andrew S. Cassidy,et al. A million spiking-neuron integrated circuit with a scalable communication network and interface , 2014, Science.

[21] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[22] M. Marinella,et al. A non-volatile organic electrochemical device as a low-voltage artificial synapse for neuromorphic computing. , 2017, Nature materials.

[23] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[24] Andrew S. Cassidy,et al. Convolutional networks for fast, energy-efficient neuromorphic computing , 2016, Proceedings of the National Academy of Sciences.

[25] Zhenbing Liu,et al. Accurate segmentation of nuclei in pathological images via sparse reconstruction and deep convolutional networks , 2017, Neurocomputing.

[26] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[27] L. Lauhon,et al. Gate-tunable memristive phenomena mediated by grain boundaries in single-layer MoS2. , 2015, Nature nanotechnology.

[28] Ran El-Yaniv,et al. Binarized Neural Networks , 2016, NIPS.

[29] Douglas A. Reynolds,et al. Deep Neural Network Approaches to Speaker and Language Recognition , 2015, IEEE Signal Processing Letters.

[30] Forrest N. Iandola,et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[31] Ninghui Sun,et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[32] Farnood Merrikh-Bayat,et al. Training and operation of an integrated neuromorphic network based on metal-oxide memristors , 2014, Nature.

[33] Yoshua Bengio,et al. Neural Networks with Few Multiplications , 2015, ICLR.

[34] Bin Liu,et al. Ternary Weight Networks , 2016, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[35] Kaushik Roy,et al. AxNN: Energy-efficient neuromorphic systems using approximate computing , 2014, 2014 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[36] Zhuowen Tu,et al. Deeply-Supervised Nets , 2014, AISTATS.

[37] Rodrigo Alvarez-Icaza,et al. Neurogrid: A Mixed-Analog-Digital Multichip System for Large-Scale Neural Simulations , 2014, Proceedings of the IEEE.