Resistive configurable associative memory for approximate computing

Modern computing machines are increasingly characterized by large scale parallelism in hardware (such as GPGPUs) and advent of large scale and innovative memory blocks. Parallelism enables expanded performance tradeoffs whereas memories enable reuse of computational work. To be effective, however, one needs to ensure energy efficiency with minimal reuse overheads. In this paper, we describe a resistive configurable associative memory (ReCAM) that enables selective approximation and asymmetric voltage overscaling to manage delivered efficiency. The ReCAM structure matches an input pattern with pre-stored ones by applying an approximate search on selected bit indices (bitline-configurable) or selective pre-stored patterns (row-configurable). To further reduce energy, we explore proper ReCAM sizing, various configurable search operations with low overhead voltage overscaling, and different ReCAM update policies. Experimental result on the AMD Southern Islands GPUs for eight applications shows bitline-configurable and row-configurable ReCAM achieve on average to 43.6% and 44.5% energy savings with an acceptable quality loss of 10%.

[1]  Eby G. Friedman,et al.  AC-DIMM: associative computing with STT-MRAM , 2013, ISCA.

[2]  Luca Benini,et al.  Temporal memoization for energy-efficient timing error recovery in GPGPUs , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[3]  Daisuke Suzuki,et al.  Spintronics-based nonvolatile logic-in-memory architecture towards an ultra-low-power and highly reliable VLSI computing paradigm , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[4]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[5]  Weisheng Zhao,et al.  A High-Speed Robust NVM-TCAM Design Using Body Bias Feedback , 2015, ACM Great Lakes Symposium on VLSI.

[6]  Swarup Bhunia,et al.  Nanoscale reconfigurable computing using non-volatile 2-D STTRAM array , 2009, 2009 9th IEEE Conference on Nanotechnology (IEEE-NANO).

[7]  Meng-Fan Chang,et al.  ReRAM-based 4T2R nonvolatile TCAM with 7x NVM-stress reduction, and 4x improvement in speed-wordlength-capacity for normally-off instant-on filter-based search engines used in big-data processing , 2014, 2014 Symposium on VLSI Circuits Digest of Technical Papers.

[8]  Luca Benini,et al.  Energy-efficient GPGPU architectures via collaborative compilation and memristive memory-based computing , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[9]  Luis Ceze,et al.  Neural Acceleration for General-Purpose Approximate Programs , 2014, IEEE Micro.

[10]  Teuvo Kohonen,et al.  Associative memory. A system-theoretical approach , 1977 .

[11]  Teuvo Kohonen,et al.  Content-addressable memories , 1980 .

[12]  Anand Rangarajan,et al.  Algorithms for advanced packet classification with ternary CAMs , 2005, SIGCOMM '05.

[13]  Peilin Song,et al.  1Mb 0.41 µm2 2T-2R cell nonvolatile TCAM with two-bit encoding and clocked self-referenced sensing , 2013, 2013 Symposium on VLSI Circuits.

[14]  Meng-Fan Chang,et al.  17.5 A 3T1R nonvolatile TCAM using MLC ReRAM with Sub-1ns search time , 2015, 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers.

[15]  Kaushik Roy,et al.  Design of voltage-scalable meta-functions for approximate computing , 2011, 2011 Design, Automation & Test in Europe.

[16]  Ashish Goel,et al.  Small subset queries and bloom filters using ternary associative memories, with applications , 2010, SIGMETRICS '10.

[17]  Hang Zhang,et al.  Low power GPGPU computation with imprecise hardware , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[18]  George Varghese,et al.  Tree bitmap: hardware/software IP lookups with incremental updates , 2004, CCRV.

[19]  Manoj Sachdev,et al.  Low-Leakage Storage Cells for Ternary Content Addressable Memories , 2009, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[20]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[21]  David Blaauw,et al.  Shortstop: An on-chip fast supply boosting technique , 2013, 2013 Symposium on VLSI Circuits.

[22]  Jason Cong,et al.  Energy-efficient computing using adaptive table lookup based on nonvolatile memories , 2013, International Symposium on Low Power Electronics and Design (ISLPED).

[23]  Luca Benini,et al.  Approximate associative memristive memory for energy-efficient GPUs , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[24]  Jing Li,et al.  1 Mb 0.41 µm² 2T-2R Cell Nonvolatile TCAM With Two-Bit Encoding and Clocked Self-Referenced Sensing , 2014, IEEE Journal of Solid-State Circuits.

[25]  Yiran Chen,et al.  Design of Spin-Torque Transfer Magnetoresistive RAM and CAM/TCAM with High Sensing and Search Speed , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[26]  Christoforos E. Kozyrakis,et al.  Evaluating MapReduce for Multi-core and Multiprocessor Systems , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[27]  Tajana Simunic,et al.  CAUSE: Critical application usage-aware memory system using non-volatile memory for mobile devices , 2015, 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[28]  Song Huang,et al.  On the energy efficiency of graphics processing units for scientific computing , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[29]  Divyakant Agrawal,et al.  Fast data stream algorithms using associative memories , 2007, SIGMOD '07.

[30]  David R. Kaeli,et al.  Multi2Sim: A simulation framework for CPU-GPU computing , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).