Opportunities and Challenges of Performing Vector Operations inside the DRAM

In order to overcome the low memory bandwidth and the high energy costs associated with the data transfer between the processor and the main memory, proposals on near-data computing started to gain acceptance in systems ranging from embedded architectures to high performance computing. The main previous approaches propose application specific hardware or require a large amount of logic. Moreover, most proposals require algorithm changes and do not make use of the full parallelism available on the DRAM devices. These issues limits the adoption and the performance of near-data computing. In this paper, we propose to implement vector instructions directly inside the DRAM devices, which we call the Memory Vector Extensions (MVX). This balanced approach reduces data movement between the DRAM to the processor while requiring a low amount of hardware to achieve good performance. Comparing to current vector operations present on processors, our proposal enable performance gains of up to 97x and reduces the energy consumption by up to 70x of the full system.

[1]  Jung Ho Ahn,et al.  DRAMA: An Architecture for Accelerated Processing Near Memory , 2015, IEEE Computer Architecture Letters.

[2]  Franz Franchetti,et al.  A 3D-stacked logic-in-memory accelerator for application-specific data intensive computing , 2013, 2013 IEEE International 3D Systems Integration Conference (3DIC).

[3]  Duncan G. Elliott,et al.  Computational RAM: Implementing Processors in Memory , 1999, IEEE Des. Test Comput..

[4]  Philippe Olivier Alexandre Navaux,et al.  SiNUCA: A Validated Micro-Architecture Simulator , 2015, 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems.

[5]  Steven Swanson,et al.  Near-Data Processing: Insights from a MICRO-46 Workshop , 2014, IEEE Micro.

[6]  Onur Mutlu,et al.  Prefetch-Aware DRAM Controllers , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[7]  Christoforos E. Kozyrakis,et al.  A case for intelligent RAM , 1997, IEEE Micro.

[8]  Marco A. Z. Alves,et al.  Increasing energy efficiency of processor caches via line usage predictors , 2014 .

[9]  David A. Padua,et al.  An Evaluation of Vectorizing Compilers , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[10]  Josep Torrellas,et al.  A Near-Memory Processor for Vector, Streaming and Bit Manipulation Workloads , 2005 .

[11]  Jung Ho Ahn,et al.  The McPAT Framework for Multicore and Manycore Architectures: Simultaneously Modeling Power, Area, and Timing , 2013, TACO.

[12]  Bruce Jacob,et al.  Memory Systems: Cache, DRAM, Disk , 2007 .

[13]  Mike Ignatowski,et al.  A new perspective on processing-in-memory architecture design , 2013, MSPC '13.

[14]  Feifei Li,et al.  Comparing Implementations of Near-Data Computing with In-Memory MapReduce Workloads , 2014, IEEE Micro.