Analysis and Design of a Passive Switched-Capacitor Matrix Multiplier for Approximate Computing

A switched-capacitor matrix multiplier is presented for approximate computing and machine learning applications. The multiply-and-accumulate operations perform discrete-time charge-domain signal processing using passive switches and 300 aF unit capacitors. The computation is digitized with a 6 b asynchronous successive approximation register analog-to-digital converter. The analyses of incomplete charge accumulation and thermal noise are discussed. The design was fabricated in 40 nm CMOS, and experimental measurements of multiplication are illustrated using matched filtering and image convolutions to analyze noise and offset. Two applications are highlighted: 1) energy-efficient feature extraction layer performing both compression and classification in a neural network for an analog front end and 2) analog acceleration for solving optimization problems that are traditionally performed in the digital domain. The chip obtains measured efficiencies of 8.7 TOPS/W at 1 GHz for the first application and 7.7 TOPS/W at 2.5 GHz for the second application.

[1]  S. Kirolos,et al.  Analog-to-Information Conversion via Random Demodulation , 2006, 2006 IEEE Dallas/CAS Workshop on Design, Applications, Integration and Software.

[2]  B. Murmann,et al.  Passive charge redistribution digital-to-analogue multiplier , 2015 .

[3]  David Blaauw,et al.  A 23mW face recognition accelerator in 40nm CMOS with mostly-read 5T memory , 2015, 2015 Symposium on VLSI Circuits (VLSI Circuits).

[4]  David Blaauw,et al.  24.3 A 36.8 2b-TOPS/W self-calibrating GPS accelerator implemented using analog calculation in 65nm LP CMOS , 2016, 2016 IEEE International Solid-State Circuits Conference (ISSCC).

[5]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[6]  M Lehne,et al.  A 0.13-$\mu{\hbox {m}}$ 1-GS/s CMOS Discrete-Time FFT Processor for Ultra-Wideband OFDM Wireless Receivers , 2011, IEEE Transactions on Microwave Theory and Techniques.

[7]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[8]  Abbas El Gamal,et al.  CMOS Image Sensor With Per-Column ΣΔ ADC and Programmable Compressed Sensing , 2013, IEEE Journal of Solid-State Circuits.

[9]  Naveen Verma,et al.  18.4 A matrix-multiplying ADC implementing a machine-learning classifier directly with data conversion , 2015, 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers.

[10]  Boris Murmann,et al.  Mismatch characterization of small metal fringe capacitors , 2013, Proceedings of the IEEE 2013 Custom Integrated Circuits Conference.

[11]  Thomas Strohmer,et al.  Compressed sensing radar , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  David Blaauw,et al.  Energy-efficient dot product computation using a switched analog circuit architecture , 2014, 2014 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[13]  Dimitris Anastassiou,et al.  Switched-capacitor neural networks , 1987 .

[14]  Gaurav Khanna,et al.  High-Precision Numerical Simulations of Rotating Black Holes Accelerated by CUDA , 2010, 1006.0663.

[15]  Boris Murmann,et al.  An 8-bit 450-MS/s single-bit/cycle SAR ADC in 65-nm CMOS , 2013, 2013 Proceedings of the ESSCIRC (ESSCIRC).

[16]  Yu-Wei Lin,et al.  A 1-GS/s FFT/IFFT processor for UWB applications , 2005, IEEE Journal of Solid-State Circuits.

[17]  Christopher M. Bishop,et al.  Current address: Microsoft Research, , 2022 .

[18]  B. Sadhu,et al.  A 5GS/s 12.2pJ/conv. analog charge-domain FFT for a software defined radio receiver front-end in 65nm CMOS , 2012, 2012 IEEE Radio Frequency Integrated Circuits Symposium.

[19]  Yann Deval,et al.  The Experimental Demonstration of a SASP-Based Full Software Radio Receiver , 2010, IEEE Journal of Solid-State Circuits.

[20]  Jie Han,et al.  Approximate computing: An emerging paradigm for energy-efficient design , 2013, 2013 18th IEEE European Test Symposium (ETS).

[21]  Brian M. Sadler,et al.  Analysis and Design of a 5 GS/s Analog Charge-Domain FFT for an SDR Front-End in 65 nm CMOS , 2013, IEEE Journal of Solid-State Circuits.

[22]  Steven R. Young,et al.  A 1 TOPS/W Analog Deep Machine-Learning Engine With Floating-Gate Storage in 0.13 µm CMOS , 2014, IEEE Journal of Solid-State Circuits.

[23]  Omid Salehi-Abari,et al.  Why Analog-to-Information Converters Suffer in High-Bandwidth Sparse Signal Applications , 2013, IEEE Transactions on Circuits and Systems I: Regular Papers.

[24]  Akira Matsuzawa,et al.  A CMOS image sensor with analog two-dimensional DCT-based compression circuits for one-chip cameras , 1997, IEEE J. Solid State Circuits.

[25]  Hiroyuki Kobayashi,et al.  An LDPC Decoder With Time-Domain Analog and Digital Mixed-Signal Processing , 2014, IEEE Journal of Solid-State Circuits.

[26]  Junfeng Yang,et al.  Practical compressive sensing with Toeplitz and circulant matrices , 2010, Visual Communications and Image Processing.

[27]  Vincent Vanhoucke,et al.  Improving the speed of neural networks on CPUs , 2011 .

[28]  Daisuke Miyashita,et al.  Convolutional Neural Networks using Logarithmic Data Representation , 2016, ArXiv.

[29]  Luis Ceze,et al.  General-purpose code acceleration with limited-precision analog computation , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[30]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.

[31]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[32]  Yonina C. Eldar,et al.  Towards an integrated circuit design of a compresssed sampling wireless receiver , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[33]  Madeleine Udell,et al.  Factorization for analog-to-digital matrix multiplication , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).