In-Hardware Training Chip Based on CMOS Invertible Logic for Machine Learning

Deep Neural Networks (DNNs) have recently shown state-of-the-art results on various applications, such as computer vision and recognition tasks. DNN inference engines can be implemented in hardware with high energy efficiency as the computation can be realized using a low-precision fixed point or even binary precision with sufficient cognition accuracies. On the other hand, training DNNs using the well-known back-propagation algorithm requires high-precision floating-point computations on a CPU and/or GPU causing significant power dissipation (more than hundreds of kW) and long training time (several days or more). In this paper, we demonstrate a training chip fabricated using a commercial 65-nm CMOS technology for machine learning. The chip performs training without back propagation by using invertible logic with stochastic computing that can directly obtain weight values using input/output training data with low precision suitable for inference. When training neurons that compute the weighted sum of all inputs and then apply a non-linear activation function, our chip demonstrates a reduction of power dissipation and latency by 99.98% and 99.95%, respectively, in comparison with a state-of-the-art software implementation.

[1]  Supriyo Datta,et al.  Hardware emulation of stochastic p-bits for invertible logic , 2017, Scientific Reports.

[2]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[3]  Hoi-Jun Yoo,et al.  7.7 LNPU: A 25.3TFLOPS/W Sparse Deep-Neural-Network Learning Processor with Fine-Grained Mixed Precision of FP8-FP16 , 2019, 2019 IEEE International Solid- State Circuits Conference - (ISSCC).

[4]  Eunhyeok Park,et al.  Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications , 2015, ICLR.

[5]  Joel Silberman,et al.  A Scalable Multi- TeraOPS Deep Learning Processor Core for AI Trainina and Inference , 2018, 2018 IEEE Symposium on VLSI Circuits.

[6]  Xiaogang Wang,et al.  Multi-stage Contextual Deep Learning for Pedestrian Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[7]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[8]  Vivienne Sze,et al.  Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.

[9]  Gökmen Tayfun,et al.  Acceleration of Deep Neural Network Training with Resistive Cross-Point Devices: Design Considerations , 2016, Front. Neurosci..

[10]  David J. Lilja,et al.  Using stochastic computing to implement digital image processing algorithms , 2011, 2011 IEEE 29th International Conference on Computer Design (ICCD).

[11]  Naoya Onizawa,et al.  An Accuracy/Energy-Flexible Configurable Gabor-Filter Chip Based on Stochastic Computation With Dynamic Voltage–Frequency–Length Scaling , 2018, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[12]  Naoya Onizawa,et al.  VLSI Implementation of Deep Neural Network Using Integral Stochastic Computing , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[13]  S. Datta,et al.  Low-Barrier Nanomagnets as p-Bits for Spin Logic , 2016, IEEE Magnetics Letters.

[14]  Shie Mannor,et al.  Fully Parallel Stochastic LDPC Decoders , 2008, IEEE Transactions on Signal Processing.

[15]  Sebastiano Vigna,et al.  Further scramblings of Marsaglia's xorshift generators , 2014, J. Comput. Appl. Math..

[16]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[17]  J. D. Biamonte,et al.  Ground-state spin logic , 2012, 1205.1742.

[18]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[19]  Brian M. Sutton,et al.  Stochastic p-bits for Invertible Logic , 2016, 1610.00377.

[20]  Xu Chen,et al.  Edge Intelligence: On-Demand Deep Learning Model Co-Inference with Device-Edge Synergy , 2018, MECOMM@SIGCOMM.

[21]  Kerem Yunus Camsari,et al.  Weighted $p$ -Bits for FPGA Implementation of Probabilistic Circuits , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[22]  S. Hido,et al.  CuPy : A NumPy-Compatible Library for NVIDIA GPU Calculations , 2017 .

[23]  W. J. Poppelbaum,et al.  Stochastic computing elements and systems , 1967, AFIPS '67 (Fall).

[24]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[26]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[27]  Vincent C. Gaudet,et al.  Iterative decoding using stochastic computation , 2003 .

[28]  Luca Benini,et al.  A Scalable Near-Memory Architecture for Training Deep Neural Networks on Large In-Memory Datasets , 2018, IEEE Transactions on Computers.

[29]  Xiaogang Wang,et al.  Switchable Deep Network for Pedestrian Detection , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Brian R. Gaines,et al.  Stochastic computing , 1967, AFIPS '67 (Spring).

[31]  Jason Cong,et al.  Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[32]  John P. Hayes,et al.  Stochastic circuits for real-time image-processing applications , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[33]  Kia Bazargan,et al.  Computation on Stochastic Bit Streams Digital Image Processing Case Studies , 2014, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[34]  Naoya Onizawa,et al.  Area/Energy-Efficient Gammatone Filters Based on Stochastic Computation , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[35]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[36]  Xinbo Chen,et al.  Evaluating the Energy Efficiency of Deep Convolutional Neural Networks on CPUs and GPUs , 2016, 2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom) (BDCloud-SocialCom-SustainCom).

[37]  Howard C. Card,et al.  Stochastic Neural Computation I: Computational Elements , 2001, IEEE Trans. Computers.

[38]  J. Biamonte Non−perturbative k−body to two−body commuting conversion Hamiltonians and embedding problem instances into Ising spins , 2008, 0801.3800.

[39]  Naoya Onizawa,et al.  Efficient CMOS Invertible Logic Using Stochastic Computing , 2019, IEEE Transactions on Circuits and Systems I: Regular Papers.

[40]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.