论文信息 - In-Memory Computation of a Machine-Learning Classifier in a Standard 6T SRAM Array

In-Memory Computation of a Machine-Learning Classifier in a Standard 6T SRAM Array

This paper presents a machine-learning classifier where computations are performed in a standard 6T SRAM array, which stores the machine-learning model. Peripheral circuits implement mixed-signal weak classifiers via columns of the SRAM, and a training algorithm enables a strong classifier through boosting and also overcomes circuit nonidealities, by combining multiple columns. A prototype 128 <inline-formula> <tex-math notation="LaTeX">$\times $ </tex-math></inline-formula> 128 SRAM array, implemented in a 130-nm CMOS process, demonstrates ten-way classification of MNIST images (using image-pixel features downsampled from 28 <inline-formula> <tex-math notation="LaTeX">$\times $ </tex-math></inline-formula> 28 = 784 to 9 <inline-formula> <tex-math notation="LaTeX">$\times $ </tex-math></inline-formula> 9 = 81, which yields a baseline accuracy of 90%). In SRAM mode (bit-cell read/write), the prototype operates up to 300 MHz, and in classify mode, it operates at 50 MHz, generating a classification every cycle. With accuracy equivalent to a discrete SRAM/digital-MAC system, the system achieves ten-way classification at an energy of 630 pJ per decision, 113 times lower than a discrete system with standard training algorithm and 13 times lower than a discrete system with the proposed training algorithm.

[1] Hoi-Jun Yoo,et al. An autonomous SRAM with on-chip sensors in an 80nm double stacked cell technology , 2005, Digest of Technical Papers. 2005 Symposium on VLSI Circuits, 2005..

[2] Gert Cauwenberghs,et al. Kerneltron: Support Vector 'Machine' in Silicon , 2002, SVM.

[3] Naresh R. Shanbhag,et al. A 481pJ/decision 3.4M decision/s Multifunctional Deep In-memory Inference Processor using Standard 6T SRAM Array , 2016, ArXiv.

[4] Geoffrey I. Webb,et al. Feature-subspace aggregating: ensembles for stable and unstable learners , 2011, Machine Learning.

[5] Dennis Sylvester,et al. A 0.45–0.7V sub-microwatt CMOS image sensor for ultra-low power applications , 2009, 2009 Symposium on VLSI Circuits.

[6] E. Seevinck,et al. Static-noise margin analysis of MOS SRAM cells , 1987 .

[7] Vivienne Sze,et al. 14.5 Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks , 2016, ISSCC.

[8] Mark Horowitz,et al. 1.1 Computing's energy problem (and what we can do about it) , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).

[9] Naveen Verma,et al. Error Adaptive Classifier Boosting (EACB): Leveraging Data-Driven Training Towards Hardware Resilience for Signal Inference , 2015, IEEE Transactions on Circuits and Systems I: Regular Papers.

[10] Hoi-Jun Yoo,et al. 14.1 A 126.1mW real-time natural UI/UX processor with embedded deep-learning core for low-power smart glasses , 2016, 2016 IEEE International Solid-State Circuits Conference (ISSCC).

[11] Youchang Kim,et al. 14.3 A 0.55V 1.1mW artificial-intelligence processor with PVT compensation for micro robots , 2016, 2016 IEEE International Solid-State Circuits Conference (ISSCC).

[12] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[13] Naresh R. Shanbhag,et al. An energy-efficient VLSI architecture for pattern recognition via deep embedding of computation in SRAM , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14] Yoav Freund,et al. Boosting: Foundations and Algorithms , 2012 .

[15] Nikunj C. Oza. Boosting with Averaged Weight Vectors , 2003, Multiple Classifier Systems.

[16] Cheng Hao Jin,et al. A New Ensemble Method with Feature Space Partitioning for High-Dimensional Data Classification , 2015 .

[17] David Blaauw,et al. A 28 nm Configurable Memory (TCAM/BCAM/SRAM) Using Push-Rule 6T Bit Cell Enabling Logic-in-Memory , 2016, IEEE Journal of Solid-State Circuits.

[18] Naveen Verma,et al. A machine-learning classifier implemented in a standard 6T SRAM array , 2016, 2016 IEEE Symposium on VLSI Circuits (VLSI-Circuits).

[19] Naveen Verma,et al. Enabling hardware relaxations through statistical learning , 2014, 2014 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[20] K. Ishibashi,et al. Universal-Vdd 0.65-2.0V 32 kB cache using voltage-adapted timing-generation scheme and a lithographical-symmetric cell , 2001, 2001 IEEE International Solid-State Circuits Conference. Digest of Technical Papers. ISSCC (Cat. No.01CH37177).