A 41.3/26.7 pJ per Neuron Weight RBM Processor Supporting On-Chip Learning/Inference for IoT Applications

An energy-efficient restricted Boltzmann machine (RBM) processor (RBM-P) supporting on-chip learning and inference is proposed for machine learning and Internet of Things (IoT) applications in this paper. To train a neural network (NN) model, the RBM structure is applied to supervised and unsupervised learning, and a multi-layer NN can be constructed and initialized by stacking multiple RBMs. Featuring NN model reduction for external memory bandwidth saving, low power neuron binarizer (LPNB) with dynamic clock gating and area-efficient NN-like activation function calculators for power reduction, user-defined connection map (UDCM) for both computation time and bandwidth saving, and early stopping (ES) mechanism for learning process, the proposed system integrates 32 RBM cores with maximal 4k neurons per layer and 128 candidates per sample for machine learning applications. Implemented in 65nm CMOS technology, the proposed RBM-P chip costs 2.2 M gates and 128 kB SRAM with 8.8 mm2 area. Operated at 1.2 V and 210 MHz, this chip achieves 7.53G neuron weights (NWs) and 11.63G NWs per second with 41.3 and 26.7 pJ per NW for learning and inference, respectively.

[1]  Zhengya Zhang,et al.  A 6.67mW sparse coding ASIC enabling on-chip learning and inference , 2014, 2014 Symposium on VLSI Circuits Digest of Technical Papers.

[2]  Paul Chow,et al.  High-Performance Reconfigurable Hardware Architecture for Restricted Boltzmann Machines , 2010, IEEE Transactions on Neural Networks.

[3]  Jerald Yoo,et al.  A 1.52 uJ/classification patient-specific seizure classification processor using Linear SVM , 2013, 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013).

[4]  Hoi-Jun Yoo,et al.  4.6 A1.93TOPS/W scalable deep learning/inference processor with tetra-parallel MIMD architecture for big-data applications , 2015, 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers.

[5]  Chen-Yi Lee,et al.  A Hardware-Efficient Sigmoid Function With Adjustable Precision for a Neural Network System , 2015, IEEE Transactions on Circuits and Systems II: Express Briefs.

[6]  H.S. Abdel-Aty-Zohdy,et al.  Implementation of programmable digital sigmoid function circuit for neuro-computing , 1998, 1998 Midwest Symposium on Circuits and Systems (Cat. No. 98CB36268).

[7]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[8]  Kunle Olukotun,et al.  A highly scalable Restricted Boltzmann Machine FPGA implementation , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[9]  Berin Martini,et al.  A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[10]  Yingchieh Ho,et al.  A 48.6-to-105.2 µW Machine Learning Assisted Cardiac Sensor SoC for Mobile Healthcare Applications , 2014, IEEE Journal of Solid-State Circuits.

[11]  T. Shibata,et al.  A Real-Time Learning Processor Based on K-means Algorithm with Automatic Seeds Generation , 2007, 2007 International Symposium on System-on-Chip.

[12]  Chen-Yi Lee,et al.  A 2 GOPS quad-mean shift processor with early termination for machine learning applications , 2014, 2014 IEEE International Symposium on Circuits and Systems (ISCAS).

[13]  Zhengya Zhang,et al.  A 640M pixel/s 3.65mW sparse event-driven neuromorphic object recognition processor with on-chip learning , 2015, 2015 Symposium on VLSI Circuits (VLSI Circuits).

[14]  Huseyin Seker,et al.  An adaptive FPGA implementation of multi-core K-nearest neighbour ensemble classifier using dynamic partial reconfiguration , 2012, 22nd International Conference on Field Programmable Logic and Applications (FPL).

[15]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[16]  Christian Igel,et al.  Training restricted Boltzmann machines: An introduction , 2014, Pattern Recognit..