Ultra-efficient processing in-memory for data intensive applications

Recent years have witnessed a rapid growth in the domain of Internet of Things (IoT). This network of billions of devices generates and exchanges huge amount of data. The limited cache capacity and memory bandwidth make transferring and processing such data on traditional CPUs and GPUs highly inefficient, both in terms of energy consumption and delay. However, many IoT applications are statistical at heart and can accept a part of inaccuracy in their computation. This enables the designers to reduce complexity of processing by approximating the results for a desired accuracy. In this paper, we propose an ultra-efficient approximate processing in-memory architecture, called APIM, which exploits the analog characteristics of non-volatile memories to support addition and multiplication inside the crossbar memory, while storing the data. The proposed design eliminates the overhead involved in transferring data to processor by virtually bringing the processor inside memory. APIM dynamically configures the precision of computation for each application in order to tune the level of accuracy during runtime. Our experimental evaluation running six general OpenCL applications shows that the proposed design achieves up to 20× performance improvement and provides 480× improvement in energy-delay product, ensuring acceptable quality of service. In exact mode, it achieves 28× energy savings and 4.8× speed up compared to the state-of-the-art GPU cores.

[1]  Walid G. Aref,et al.  M3: Stream Processing on Main-Memory MapReduce , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[2]  Tajana Simunic,et al.  Resistive configurable associative memory for approximate computing , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[3]  Eby G. Friedman,et al.  AC-DIMM: associative computing with STT-MRAM , 2013, ISCA.

[4]  Wei Wu,et al.  A hybrid nanomemristor/transistor logic circuit capable of self-programming , 2009, Proceedings of the National Academy of Sciences.

[5]  Tajana Simunic,et al.  MASC: Ultra-low energy multiple-access single-charge TCAM for approximate computing , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[6]  Steven Swanson,et al.  Near-Data Processing: Insights from a MICRO-46 Workshop , 2014, IEEE Micro.

[7]  Uri C. Weiser,et al.  Memristor-Based Material Implication (IMPLY) Logic: Design Principles and Methodologies , 2014, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[8]  Tajana Simunic,et al.  ACAM: Approximate Computing Based on Adaptive Associative Memory with Online Learning , 2016, ISLPED.

[9]  Gregory S. Snider,et al.  ‘Memristive’ switches enable ‘stateful’ logic operations via material implication , 2010, Nature.

[10]  Ran Ginosar,et al.  Resistive Associative Processor , 2015, IEEE Computer Architecture Letters.

[11]  Nishil Talati,et al.  Logic Design Within Memristive Memories Using Memristor-Aided loGIC (MAGIC) , 2016, IEEE Transactions on Nanotechnology.

[12]  Geoffrey C. Fox,et al.  Distributed and Cloud Computing: From Parallel Processing to the Internet of Things , 2011 .

[13]  Tajana Simunic,et al.  Efficient neural network acceleration on GPGPU using content addressable memory , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[14]  Farinaz Koushanfar,et al.  LookNN: Neural network with no multiplication , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[15]  Jan M. Rabaey,et al.  Exploring Hyperdimensional Associative Memory , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[16]  David R. Kaeli,et al.  Multi2Sim: A simulation framework for CPU-GPU computing , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[17]  Kaushik Roy,et al.  Low-Power Digital Signal Processing Using Approximate Adders , 2013, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[18]  Anne Siemon,et al.  A Complementary Resistive Switch-Based Crossbar Array Adder , 2015, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[19]  Uri C. Weiser,et al.  MAGIC—Memristor-Aided Logic , 2014, IEEE Transactions on Circuits and Systems II: Express Briefs.

[20]  Jie Han,et al.  Approximate computing: An emerging paradigm for energy-efficient design , 2013, 2013 18th IEEE European Test Symposium (ETS).

[21]  Gabriel H. Loh Nuwan Jayasena Mark H. Oskin Mark Nutter Da Ignatowski A Processing-in-Memory Taxonomy and a Case for Studying Fixed-function PIM , 2013 .

[22]  Marimuthu Palaniswami,et al.  Internet of Things (IoT): A vision, architectural elements, and future directions , 2012, Future Gener. Comput. Syst..

[23]  Kaushik Roy,et al.  IMPACT: IMPrecise adders for low-power approximate computing , 2011, IEEE/ACM International Symposium on Low Power Electronics and Design.

[24]  Engin Ipek,et al.  A resistive TCAM accelerator for data-intensive computing , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[25]  Feifei Li,et al.  Comparing Implementations of Near-Data Computing with In-Memory MapReduce Workloads , 2014, IEEE Micro.

[26]  Eby G. Friedman,et al.  VTEAM – A General Model for Voltage Controlled Memristors , 2014 .

[27]  Michael T. Niemier,et al.  Design and benchmarking of ferroelectric FET based TCAM , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[28]  Tajana Simunic,et al.  MPIM: Multi-purpose in-memory processing using configurable resistive memory , 2017, 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC).