Energy Efficient RRAM Crossbar-Based Approximate Computing for Smart Cameras

Smart cameras have been applied successfully in many fields. The limited battery capacity and power efficiency restrict the local processing capacity of smart cameras. In order to shift vision processing closer to the sensors, we propose a power efficient framework for analog approximate computing with the emerging metal-oxide resistive switching random-access memory (RRAM) devices. A programmable RRAM-based approximate computing unit (RRAM-ACU) is introduced first to accelerate approximated computation, and a scalable approximate computing framework is then proposed on top of the RRAM-ACU. In order to program the RRAM-ACU efficiently, we also present a detailed configuration flow, which includes a customized approximator training scheme, an approximator-parameter-to-RRAM-state mapping algorithm, and an RRAM state tuning scheme. Simulation results on a set of diverse benchmarks demonstrate that, compared with an x86-64 CPU at 2 GHz, the RRAM-ACU is able to achieve 4.06–196.41× speedup and power efficiency of 24.59–567.98 GFLOPS/W with quality loss of 8.72 % on average. The implementation of HMAX application further demonstrates that the proposed RRAM-based approximate computing framework can achieve > 12. 8× power efficiency than the digital implementation counterparts (CPU, GPU, and FPGA).

[1]  Bing Chen,et al.  RRAM Crossbar Array With Cell Selection Device: A Device and Circuit Interaction Study , 2013, IEEE Transactions on Electron Devices.

[2]  Yiran Chen,et al.  Reduction and IR-drop compensations techniques for reliable neuromorphic computing systems , 2014, 2014 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[3]  Karthikeyan Sankaralingam,et al.  Dark silicon and the end of multicore scaling , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[4]  Yu Wang,et al.  ICE: Inline calibration for memristor crossbar-based computing engine , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[5]  Shimeng Yu,et al.  A Low Energy Oxide‐Based Electronic Synaptic Device for Neuromorphic Visual Systems with Tolerance to Device Variation , 2013, Advanced materials.

[6]  Cong Xu,et al.  Design implications of memristor-based RRAM cross-point structures , 2011, 2011 Design, Automation & Test in Europe.

[7]  Laurene V. Fausett,et al.  Fundamentals Of Neural Networks , 1994 .

[8]  Hao Jiang,et al.  A spiking neuromorphic design with resistive crossbar , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[9]  Qing Wu,et al.  Hardware realization of BSB recall function using memristor crossbar arrays , 2012, DAC Design Automation Conference 2012.

[10]  Yu Wang,et al.  Memristor-based approximated computation , 2013, International Symposium on Low Power Electronics and Design (ISLPED).

[11]  Yu Wang,et al.  MErging the Interface: Power, area and accuracy co-optimization for RRAM crossbar-based mixed-signal computing system , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[12]  Kaushik Roy,et al.  Quality programmable vector processors for approximate computing , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[13]  Hao Jiang,et al.  RENO: A high-efficient reconfigurable neuromorphic computing accelerator design , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[14]  Majid Ahmadi,et al.  Analog Implementation of a Novel Resistive-Type Sigmoidal Neuron , 2012, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[15]  Engin Ipek,et al.  Memristive Boltzmann machine: A hardware accelerator for combinatorial optimization and deep learning , 2017, 2017 Fifth Berkeley Symposium on Energy Efficient Electronic Systems & Steep Transistors Workshop (E3S).

[16]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[17]  Farnood Merrikh-Bayat,et al.  Training and operation of an integrated neuromorphic network based on metal-oxide memristors , 2014, Nature.

[18]  Shimeng Yu,et al.  MNSIM: Simulation platform for memristor-based neuromorphic computing system , 2016, DATE 2016.

[19]  Sachhidh Kannan,et al.  Sneak-Path Testing of Crossbar-Based Nonvolatile Random Access Memories , 2013, IEEE Transactions on Nanotechnology.

[20]  Shimeng Yu,et al.  A SPICE Compact Model of Metal Oxide Resistive Switching Memory With Variations , 2012, IEEE Electron Device Letters.

[21]  Shimeng Yu,et al.  On-Chip Sparse Learning Acceleration With CMOS and Resistive Synaptic Devices , 2015, IEEE Transactions on Nanotechnology.

[22]  Ligang Gao,et al.  High precision tuning of state for memristive devices by adaptable variation-tolerant algorithm , 2011, Nanotechnology.

[23]  Wei Yang Lu,et al.  Nanoscale memristor device as synapse in neuromorphic systems. , 2010, Nano letters.

[24]  Shimeng Yu,et al.  Metal–Oxide RRAM , 2012, Proceedings of the IEEE.

[25]  Gaetano Palumbo,et al.  Design Procedures for Three-Stage CMOS OTAs With Nested-Miller Compensation , 2007, IEEE Transactions on Circuits and Systems I: Regular Papers.

[26]  Luis Ceze,et al.  Neural Acceleration for General-Purpose Approximate Programs , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[27]  Rakesh Kumar,et al.  On reconfiguration-oriented approximate adder design and its application , 2013, 2013 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[28]  David G. Lowe,et al.  University of British Columbia. , 1945, Canadian Medical Association journal.

[29]  Frederick T. Chen,et al.  Low power and high speed bipolar switching with a thin reactive Ti buffer layer in robust HfO2 based RRAM , 2008, 2008 IEEE International Electron Devices Meeting.

[30]  Tomaso A. Poggio,et al.  Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.

[31]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[32]  Yoshifusa Ito,et al.  Approximation Capability of Layered Neural Networks with Sigmoid Units on Two Layers , 1994, Neural Computation.

[33]  Rong Luo,et al.  Spiking neural network with RRAM: Can we use it for real-world application? , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[34]  Narayan Srinivasa,et al.  A functional hybrid memristor crossbar-array/CMOS system for data storage and neuromorphic applications. , 2012, Nano letters.

[35]  Ahmed Nabil Belbachir,et al.  Quality control of real-time panoramic views from the smart camera 360SCAN , 2013, 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013).

[36]  Bertan Bakkaloglu,et al.  A CMOS Low-Dropout Regulator With Current-Mode Feedback Buffer Amplifier , 2007, IEEE Transactions on Circuits and Systems II: Express Briefs.

[37]  Hae-Seung Lee,et al.  A high-swing CMOS telescopic operational amplifier , 1998 .

[38]  Yiran Chen,et al.  An EDA framework for large scale hybrid neuromorphic computing systems , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[39]  Narayanan Vijaykrishnan,et al.  Accelerating neuromorphic vision algorithms for recognition , 2012, DAC Design Automation Conference 2012.

[40]  Phillip E Allen,et al.  CMOS Analog Circuit Design , 1987 .

[41]  Guido Torelli,et al.  A Bipolar-Selected Phase Change Memory Featuring Multi-Level Cell Storage , 2009, IEEE Journal of Solid-State Circuits.

[42]  Yu Wang,et al.  RRAM-Based Analog Approximate Computing , 2015, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[43]  Anand Raghunathan,et al.  Best-effort computing: Re-thinking parallel software and hardware , 2010, Design Automation Conference.

[44]  Yiran Chen,et al.  Vortex: Variation-aware training for memristor X-bar , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[45]  Yusuf Leblebici,et al.  A 3.1 mW 8b 1.2 GS/s Single-Channel Asynchronous SAR ADC With Alternate Comparators for Enhanced Speed in 32 nm Digital SOI CMOS , 2013, IEEE Journal of Solid-State Circuits.

[46]  Domenic Forte,et al.  Memristor PUF—A Security Primitive: Theory and Experiment , 2015, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[47]  Yu Wang,et al.  Technological exploration of RRAM crossbar array for matrix-vector multiplication , 2015, ASP-DAC.