ACAM: Approximate Computing Based on Adaptive Associative Memory with Online Learning

The Internet of Things (IoT) dramatically increases the amount of data to be processed for many applications including multimedia. Unlike traditional computing environment, the workload of IoT significantly varies overtime. Thus, an efficient runtime profiling is required to extract highly frequent computations and pre-store them for memory-based computing. In this paper, we propose an approximate computing technique using a low-cost adaptive associative memory, named ACAM, which utilizes runtime learning and profiling. To recognize the temporal locality of data in real-world applications, our design exploits a reinforcement learning algorithm with a least recently use (LRU) strategy to select images to be profiled; the profiler is implemented using an approximate concurrent state machine. The profiling results are then stored into ACAM for computation reuse. Since the selected images represent the observed input dataset, we can avoid redundant computations thanks to high hit rates displayed in the associative memory. We evaluate ACAM on the recent AMD Southern Island GPU architecture, and the experimental results shows that the proposed design achieves by 34.7% energy saving for image processing applications with an acceptable quality of service (i.e., PSNR>30dB).

[1]  William J. Dally,et al.  GPUs and the Future of Parallel Computing , 2011, IEEE Micro.

[2]  Tajana Simunic,et al.  MASC: Ultra-low energy multiple-access single-charge TCAM for approximate computing , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[3]  Tajana Simunic,et al.  CAUSE: Critical application usage-aware memory system using non-volatile memory for mobile devices , 2015, 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[4]  Divyakant Agrawal,et al.  Fast data stream algorithms using associative memories , 2007, SIGMOD '07.

[5]  Mohammed Ghanbari,et al.  Scope of validity of PSNR in image/video quality assessment , 2008 .

[6]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[7]  Anand Rangarajan,et al.  Algorithms for advanced packet classification with ternary CAMs , 2005, SIGCOMM '05.

[8]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[9]  Teuvo Kohonen,et al.  Content-addressable memories , 1980 .

[10]  Hang Zhang,et al.  Low power GPGPU computation with imprecise hardware , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[11]  David R. Kaeli,et al.  Multi2Sim: A simulation framework for CPU-GPU computing , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[12]  Lawrence Chisvin,et al.  Content-addressable and associative memory: alternatives to the ubiquitous RAM , 1989, Computer.

[13]  Mohsen Imani,et al.  Approximate Computing Using Multiple-Access Single-Charge Associative Memory , 2018, IEEE Transactions on Emerging Topics in Computing.

[14]  Luca Benini,et al.  Energy-efficient GPGPU architectures via collaborative compilation and memristive memory-based computing , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[15]  Gunar Schirner,et al.  Function-Level Processor (FLP): A High Performance, Minimal Bandwidth, Low Power Architecture for Market-Oriented MPSoCs , 2014, IEEE Embedded Systems Letters.

[16]  Gerald Tesauro,et al.  Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[17]  Swarup Bhunia,et al.  Nanoscale reconfigurable computing using non-volatile 2-D STTRAM array , 2009, 2009 9th IEEE Conference on Nanotechnology (IEEE-NANO).

[18]  Jing Li,et al.  1 Mb 0.41 µm² 2T-2R Cell Nonvolatile TCAM With Two-Bit Encoding and Clocked Self-Referenced Sensing , 2014, IEEE Journal of Solid-State Circuits.

[19]  Daisuke Suzuki,et al.  Spintronics-based nonvolatile logic-in-memory architecture towards an ultra-low-power and highly reliable VLSI computing paradigm , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[20]  Tajana Simunic,et al.  Resistive configurable associative memory for approximate computing , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[21]  George Varghese,et al.  Beyond bloom filters: from approximate membership checks to approximate state machines , 2006, SIGCOMM.

[22]  Teuvo Kohonen,et al.  Associative memory. A system-theoretical approach , 1977 .

[23]  Jason Cong,et al.  Energy-efficient computing using adaptive table lookup based on nonvolatile memories , 2013, International Symposium on Low Power Electronics and Design (ISLPED).

[24]  Luca Benini,et al.  Approximate associative memristive memory for energy-efficient GPUs , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[25]  Dan Feng,et al.  Locality-Sensitive Bloom Filter for Approximate Membership Query , 2012, IEEE Transactions on Computers.

[26]  Tetsuo Endoh,et al.  Fully parallel 6T-2MTJ nonvolatile TCAM with single-transistor-based self match-line discharge control , 2011, 2011 Symposium on VLSI Circuits - Digest of Technical Papers.

[27]  Sally A. McKee,et al.  Design of a parallel vector access unit for SDRAM memory systems , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[28]  Tajana Simunic,et al.  ReMAM: Low energy Resistive Multi-stage Associative Memory for energy efficient computing , 2016, 2016 17th International Symposium on Quality Electronic Design (ISQED).

[29]  J. Manyika Big data: The next frontier for innovation, competition, and productivity , 2011 .

[30]  Daniel Gatica-Perez,et al.  Smartphone usage in the wild: a large-scale analysis of applications and context , 2011, ICMI '11.

[31]  K. Pagiamtzis,et al.  Content-addressable memory (CAM) circuits and architectures: a tutorial and survey , 2006, IEEE Journal of Solid-State Circuits.

[32]  Christoforos E. Kozyrakis,et al.  Evaluating MapReduce for Multi-core and Multiprocessor Systems , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[33]  Cong Xu,et al.  NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.