GenPIM: Generalized processing in-memory to accelerate data intensive applications

Big data has become a serious problem as data volumes have been skyrocketing for the past few years. Storage and CPU technologies are overwhelmed by the amount of data they have to handle. Traditional computer architectures show poor performance when processing such huge data. Processing inmemory is a promising technique to address data movement issue by locally processing data inside memory. However, there are two main issues with stand-alone PIM designs: (i) PIM is not always computationally faster than CMOS logic, (ii) PIM cannot process all operations in many applications. Thus, not many applications can benefit from PIM. To generalize the use of PIM, we designed GenPIM, a general processing in-memory architecture consisting of the conventional processor as well as the PIM accelerators. GenPIM supports basic PIM functionalities in specialized nonvolatile memory including: bitwise operations, search operation, addition and multiplication. For each application, GenPIM identifies the part which uses PIM operations, and processes the rest of non-PIM operations or not data intensive part of applications in general purpose cores. GenPIM also enables configurable PIM approximation by relaxing in-memory computation. We test the efficiency of proposed design over two learning applications. Our experimental evaluation shows that our design can achieve 10.9 χ improvement in energy efficiency and 6.4 χ speedup as compared to processing data in conventional cores.

[1]  Farinaz Koushanfar,et al.  LookNN: Neural network with no multiplication , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[2]  Tajana Simunic,et al.  NNgine: Ultra-Efficient Nearest Neighbor Accelerator Based on In-Memory Computing , 2017, 2017 IEEE International Conference on Rebooting Computing (ICRC).

[3]  Eby G. Friedman,et al.  VTEAM – A General Model for Voltage Controlled Memristors , 2014 .

[4]  Rami G. Melhem,et al.  PRES: Pseudo-Random Encoding Scheme to increase the bit flip reduction in the memory , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[5]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[6]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[7]  A. Krizhevsky Convolutional Deep Belief Networks on CIFAR-10 , 2010 .

[8]  Mohsen Imani,et al.  Ultra-efficient processing in-memory for data intensive applications , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[9]  B. S. Manjunath,et al.  Content-based search of video using color, texture, and motion , 1997, Proceedings of International Conference on Image Processing.

[10]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[11]  Tajana Simunic,et al.  MASC: Ultra-low energy multiple-access single-charge TCAM for approximate computing , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[12]  Tajana Simunic,et al.  CAUSE: Critical application usage-aware memory system using non-volatile memory for mobile devices , 2015, 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[13]  Nishil Talati,et al.  Logic Design Within Memristive Memories Using Memristor-Aided loGIC (MAGIC) , 2016, IEEE Transactions on Nanotechnology.

[14]  Tajana Simunic,et al.  MPIM: Multi-purpose in-memory processing using configurable resistive memory , 2017, 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC).

[15]  Mohsen Imani,et al.  Approximate Computing Using Multiple-Access Single-Charge Associative Memory , 2018, IEEE Transactions on Emerging Topics in Computing.

[16]  Stefan Poslad,et al.  Ubiquitous Computing: Smart Devices, Environments and Interactions , 2009 .

[17]  Tajana Simunic,et al.  Resistive configurable associative memory for approximate computing , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[18]  Anne Siemon,et al.  A Complementary Resistive Switch-Based Crossbar Array Adder , 2015, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[19]  Miao Hu,et al.  ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[20]  Tajana Simunic,et al.  Efficient query processing in crossbar memory , 2017, 2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[21]  Tajana Rosing,et al.  Resistive CAM Acceleration for Tunable Approximate Computing , 2019, IEEE Transactions on Emerging Topics in Computing.

[22]  Kiyoung Choi,et al.  A scalable processing-in-memory accelerator for parallel graph processing , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[23]  Didier Stricker,et al.  Creating and benchmarking a new dataset for physical activity monitoring , 2012, PETRA '12.