Approximate Computing Using Multiple-Access Single-Charge Associative Memory

Memory-based computing using associative memory is a promising way to reduce the energy consumption of important classes of streaming applications by avoiding redundant computations. A set of frequent patterns that represent basic functions are pre-stored in Ternary Content Addressable Memory (TCAM) and reused. The primary limitation to using associative memory in modern parallel processors is the large search energy required by TCAMs. In TCAMs, all rows that match, except hit rows, precharge and discharge for every search operation, resulting in high energy consumption. In this paper, we propose a new Multiple-Access Single-Charge (MASC) TCAM architecture which is capable of searching TCAM contents multiple times with only a single precharge cycle. In contrast to previous designs, the MASC TCAM keeps the match-line voltage of all miss-rows high and uses their charge for the next search operation, while only the hit rows discharge. We use periodic refresh to control the accuracy of the search. We also implement a new type of approximate associative memory by setting longer refresh times for MASC TCAMs, which yields search results within 1–2 bit Hamming distances of the exact value. To further decrease the energy consumption of MASC TCAM and reduce the area, we implement MASC with crossbar TCAMs. Our evaluation on AMD Southern Island GPU shows that using MASC (crossbar MASC) associative memory can improve the average floating point units energy efficiency by 33.4, 38.1, and 36.7 percent (37.7, 42.6, and 43.1 percent) for exact matching, selective 1-HD and 2-HD approximations respectively, providing an acceptable quality of service (PSNR > 30 dB and average relative error <10 percent). This shows that MASC (crossbar MASC) can achieve 1.77X (1.93X) higher energy savings as compared to the state of the art implementation of GPGPU that uses voltage overscaling on TCAM.

[1]  R. Symanczyk,et al.  Conductive bridging RAM (CBRAM): an emerging non-volatile memory technology scalable to sub 20nm , 2005, IEEE InternationalElectron Devices Meeting, 2005. IEDM Technical Digest..

[2]  Frederick T. Chen,et al.  Highly scalable hafnium oxide memory with improvements of resistive distribution and read disturb immunity , 2009, 2009 IEEE International Electron Devices Meeting (IEDM).

[3]  Jishen Zhao,et al.  Emerging Memory Technologies , 2019, IEEE Micro.

[4]  Tajana Simunic,et al.  ReMAM: Low energy Resistive Multi-stage Associative Memory for energy efficient computing , 2016, 2016 17th International Symposium on Quality Electronic Design (ISQED).

[5]  Divyakant Agrawal,et al.  Fast data stream algorithms using associative memories , 2007, SIGMOD '07.

[6]  Mohammed Ghanbari,et al.  Scope of validity of PSNR in image/video quality assessment , 2008 .

[7]  Ashish Goel,et al.  Small subset queries and bloom filters using ternary associative memories, with applications , 2010, SIGMETRICS '10.

[8]  Eby G. Friedman,et al.  AC-DIMM: associative computing with STT-MRAM , 2013, ISCA.

[9]  Lawrence Chisvin,et al.  Content-addressable and associative memory: alternatives to the ubiquitous RAM , 1989, Computer.

[10]  Hang Zhang,et al.  Low power GPGPU computation with imprecise hardware , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[11]  Kaushik Roy,et al.  Design of voltage-scalable meta-functions for approximate computing , 2011, 2011 Design, Automation & Test in Europe.

[12]  Swarup Bhunia,et al.  Nanoscale reconfigurable computing using non-volatile 2-D STTRAM array , 2009, 2009 9th IEEE Conference on Nanotechnology (IEEE-NANO).

[13]  Rainer Waser,et al.  Complementary resistive switches for passive nanocrossbar memories. , 2010, Nature materials.

[14]  Melvin A. Breuer,et al.  Multi-media applications and imprecise computation , 2005, 8th Euromicro Conference on Digital System Design (DSD'05).

[15]  Tajana Simunic,et al.  Resistive configurable associative memory for approximate computing , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[16]  Narayan Srinivasa,et al.  A functional hybrid memristor crossbar-array/CMOS system for data storage and neuromorphic applications. , 2012, Nano letters.

[17]  Anand Rangarajan,et al.  Algorithms for advanced packet classification with ternary CAMs , 2005, SIGCOMM '05.

[18]  Daisuke Suzuki,et al.  Spintronics-based nonvolatile logic-in-memory architecture towards an ultra-low-power and highly reliable VLSI computing paradigm , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[19]  Jason Cong,et al.  Energy-efficient computing using adaptive table lookup based on nonvolatile memories , 2013, International Symposium on Low Power Electronics and Design (ISLPED).

[20]  Luca Benini,et al.  Approximate associative memristive memory for energy-efficient GPUs , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[21]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[22]  Abdul Majid Mazlina,et al.  Big Data Processing in Cloud Computing Environments , 2017 .

[23]  R. Waser,et al.  Nanoionics-based resistive switching memories. , 2007, Nature materials.

[24]  Luca Benini,et al.  CIRCA-GPUs: Increasing Instruction Reuse Through Inexact Computing in GP-GPUs , 2016, IEEE Design & Test.

[25]  Teuvo Kohonen,et al.  Associative memory. A system-theoretical approach , 1977 .

[26]  Tajana Simunic,et al.  ACAM: Approximate Computing Based on Adaptive Associative Memory with Online Learning , 2016, ISLPED.

[27]  Shoji Ikeda,et al.  A 3.14 um2 4T-2MTJ-cell fully parallel TCAM based on nonvolatile logic-in-memory architecture , 2012, 2012 Symposium on VLSI Circuits (VLSIC).

[28]  K. Pagiamtzis,et al.  Content-addressable memory (CAM) circuits and architectures: a tutorial and survey , 2006, IEEE Journal of Solid-State Circuits.

[29]  Christoforos E. Kozyrakis,et al.  Evaluating MapReduce for Multi-core and Multiprocessor Systems , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[30]  Jing Li,et al.  1 Mb 0.41 µm² 2T-2R Cell Nonvolatile TCAM With Two-Bit Encoding and Clocked Self-Referenced Sensing , 2014, IEEE Journal of Solid-State Circuits.

[31]  David R. Kaeli,et al.  Multi2Sim: A simulation framework for CPU-GPU computing , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[32]  Rajesh K. Gupta,et al.  Resistive Bloom filters: From approximate membership to approximate computing with bounded errors , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[33]  Kyoung-Rok Cho,et al.  Memristor MOS Content Addressable Memory (MCAM): Hybrid Architecture for Future High Performance Search Engines , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[34]  Teuvo Kohonen,et al.  Content-addressable memories , 1980 .

[35]  Meng-Fan Chang,et al.  17.5 A 3T1R nonvolatile TCAM using MLC ReRAM with Sub-1ns search time , 2015, 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers.

[36]  Sethuraman Panchanathan,et al.  A content-addressable memory architecture for image coding using vector quantization , 1991, IEEE Trans. Signal Process..

[37]  Avita Katal,et al.  Big data: Issues, challenges, tools and Good practices , 2013, 2013 Sixth International Conference on Contemporary Computing (IC3).

[38]  Yuchao Yang,et al.  Complementary resistive switching in tantalum oxide-based resistive memory devices , 2012, 1204.3515.

[39]  Stefanos Kaxiras,et al.  IPStash: a set-associative memory approach for efficient IP-lookup , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..

[40]  J. Yang,et al.  Memristive switching mechanism for metal/oxide/metal nanodevices. , 2008, Nature nanotechnology.

[41]  Tajana Simunic,et al.  MASC: Ultra-low energy multiple-access single-charge TCAM for approximate computing , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[42]  Tajana Simunic,et al.  CAUSE: Critical application usage-aware memory system using non-volatile memory for mobile devices , 2015, 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[43]  Meng-Fan Chang,et al.  Challenges and Circuit Techniques for Energy-Efficient On-Chip Nonvolatile Memory Using Memristive Devices , 2015, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.