In-Memory Computation of a Machine-Learning Classifier in a Standard 6T SRAM Array

This paper presents a machine-learning classifier where computations are performed in a standard 6T SRAM array, which stores the machine-learning model. Peripheral circuits implement mixed-signal weak classifiers via columns of the SRAM, and a training algorithm enables a strong classifier through boosting and also overcomes circuit nonidealities, by combining multiple columns. A prototype 128 <inline-formula> <tex-math notation="LaTeX">$\times $ </tex-math></inline-formula> 128 SRAM array, implemented in a 130-nm CMOS process, demonstrates ten-way classification of MNIST images (using image-pixel features downsampled from 28 <inline-formula> <tex-math notation="LaTeX">$\times $ </tex-math></inline-formula> 28 = 784 to 9 <inline-formula> <tex-math notation="LaTeX">$\times $ </tex-math></inline-formula> 9 = 81, which yields a baseline accuracy of 90%). In SRAM mode (bit-cell read/write), the prototype operates up to 300 MHz, and in classify mode, it operates at 50 MHz, generating a classification every cycle. With accuracy equivalent to a discrete SRAM/digital-MAC system, the system achieves ten-way classification at an energy of 630 pJ per decision, 113 times lower than a discrete system with standard training algorithm and 13 times lower than a discrete system with the proposed training algorithm.

[1]  Hoi-Jun Yoo,et al.  An autonomous SRAM with on-chip sensors in an 80nm double stacked cell technology , 2005, Digest of Technical Papers. 2005 Symposium on VLSI Circuits, 2005..

[2]  Gert Cauwenberghs,et al.  Kerneltron: Support Vector 'Machine' in Silicon , 2002, SVM.

[3]  Naresh R. Shanbhag,et al.  A 481pJ/decision 3.4M decision/s Multifunctional Deep In-memory Inference Processor using Standard 6T SRAM Array , 2016, ArXiv.

[4]  Geoffrey I. Webb,et al.  Feature-subspace aggregating: ensembles for stable and unstable learners , 2011, Machine Learning.

[5]  Dennis Sylvester,et al.  A 0.45–0.7V sub-microwatt CMOS image sensor for ultra-low power applications , 2009, 2009 Symposium on VLSI Circuits.

[6]  E. Seevinck,et al.  Static-noise margin analysis of MOS SRAM cells , 1987 .

[7]  Vivienne Sze,et al.  14.5 Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks , 2016, ISSCC.

[8]  Mark Horowitz,et al.  1.1 Computing's energy problem (and what we can do about it) , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).

[9]  Naveen Verma,et al.  Error Adaptive Classifier Boosting (EACB): Leveraging Data-Driven Training Towards Hardware Resilience for Signal Inference , 2015, IEEE Transactions on Circuits and Systems I: Regular Papers.

[10]  Hoi-Jun Yoo,et al.  14.1 A 126.1mW real-time natural UI/UX processor with embedded deep-learning core for low-power smart glasses , 2016, 2016 IEEE International Solid-State Circuits Conference (ISSCC).

[11]  Youchang Kim,et al.  14.3 A 0.55V 1.1mW artificial-intelligence processor with PVT compensation for micro robots , 2016, 2016 IEEE International Solid-State Circuits Conference (ISSCC).

[12]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[13]  Naresh R. Shanbhag,et al.  An energy-efficient VLSI architecture for pattern recognition via deep embedding of computation in SRAM , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Yoav Freund,et al.  Boosting: Foundations and Algorithms , 2012 .

[15]  Nikunj C. Oza Boosting with Averaged Weight Vectors , 2003, Multiple Classifier Systems.

[16]  Cheng Hao Jin,et al.  A New Ensemble Method with Feature Space Partitioning for High-Dimensional Data Classification , 2015 .

[17]  David Blaauw,et al.  A 28 nm Configurable Memory (TCAM/BCAM/SRAM) Using Push-Rule 6T Bit Cell Enabling Logic-in-Memory , 2016, IEEE Journal of Solid-State Circuits.

[18]  Naveen Verma,et al.  A machine-learning classifier implemented in a standard 6T SRAM array , 2016, 2016 IEEE Symposium on VLSI Circuits (VLSI-Circuits).

[19]  Naveen Verma,et al.  Enabling hardware relaxations through statistical learning , 2014, 2014 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[20]  K. Ishibashi,et al.  Universal-Vdd 0.65-2.0V 32 kB cache using voltage-adapted timing-generation scheme and a lithographical-symmetric cell , 2001, 2001 IEEE International Solid-State Circuits Conference. Digest of Technical Papers. ISSCC (Cat. No.01CH37177).