Operand size reconfiguration for big data processing in memory

Nowadays, applications that predominantly perform lookups over large databases are becoming more popular with column-stores as the database system architecture of choice. For these applications, Hybrid Memory Cubes (HMCs) can provide bandwidth of up to 320 GB/s and represents the best choice to keep the throughput for these ever increasing databases. However, even with the high available memory bandwidth and processing power, in order to achieve the peak performance, data movements through the memory hierarchy consumes an unnecessary amount of time and energy. In order to accelerate database operations, and reduce the energy consumption of the system, this paper presents the Reconfigurable Vector Unit (RVU) that enables massive and adaptive in-memory processing, extending the native HMC instructions and also increasing its effectiveness. RVU enables the programmer to reconfigure it to perform as a large vector unit or multiple small vectors units to better adjust for the application needs during different computation phases. Due to its adaptability, RVU is capable of achieving performance increase of 27 χ on average and reduce the DRAM energy consumption in 29% when compared to an x86 processor with 16 cores. Compared with the state-of-the-art mechanism capable of performing large vector operations with fixed size, inside the HMC, RVU performed up to 12% better in terms of performance and improve in 53% the energy consumption.

[1]  Luigi Carro,et al.  Exploring Cache Size and Core Count Tradeoffs in Systems with Reduced Memory Access Latency , 2016, 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP).

[2]  J. Jeddeloh,et al.  Hybrid memory cube new DRAM architecture increases density and performance , 2012, 2012 Symposium on VLSI Technology (VLSIT).

[3]  Luigi Carro,et al.  Large vector extensions inside the HMC , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[4]  Feifei Li,et al.  NDC: Analyzing the impact of 3D-stacked memory+logic devices on MapReduce workloads , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[5]  Duncan G. Elliott,et al.  Computational RAM: Implementing Processors in Memory , 1999, IEEE Des. Test Comput..

[6]  Philippe Olivier Alexandre Navaux,et al.  SiNUCA: A Validated Micro-Architecture Simulator , 2015, 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems.

[7]  Setrag Khoshafian,et al.  A decomposition storage model , 1985, SIGMOD Conference.

[8]  Jung Ho Ahn,et al.  Near-DRAM Acceleration with Single-ISA Heterogeneous Processing in Standard Memory Modules , 2016, IEEE Micro.

[9]  Daniel J. Abadi,et al.  Column oriented Database Systems , 2009, Proc. VLDB Endow..

[10]  Tejas Karkhanis,et al.  Active Memory Cube: A processing-in-memory architecture for exascale systems , 2015, IBM J. Res. Dev..

[11]  A. Jourdain,et al.  3D stacked IC demonstration using a through Silicon Via First approach , 2008, 2008 IEEE International Electron Devices Meeting.

[12]  Simranjit Kaur,et al.  A Comparative Study of Database Systems , 2012 .

[13]  Kiyoung Choi,et al.  A scalable processing-in-memory accelerator for parallel graph processing , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[14]  Luigi Carro,et al.  Saving memory movements through vector processing in the DRAM , 2015, 2015 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES).

[15]  Martin L. Kersten,et al.  Optimizing database architecture for the new bottleneck: memory access , 2000, The VLDB Journal.

[16]  Christoforos E. Kozyrakis,et al.  A case for intelligent RAM , 1997, IEEE Micro.

[17]  Thomas Neumann,et al.  TPC-H Analyzed: Hidden Messages and Lessons Learned from an Influential Benchmark , 2013, TPCTC.

[18]  Michael Stonebraker,et al.  Intel "big data" science and technology center vision and execution plan , 2013, SGMD.

[19]  Jung Ho Ahn,et al.  DRAMA: An Architecture for Accelerated Processing Near Memory , 2015, IEEE Computer Architecture Letters.

[20]  Jung Ho Ahn,et al.  The McPAT Framework for Multicore and Manycore Architectures: Simultaneously Modeling Power, Area, and Timing , 2013, TACO.

[21]  Manos Athanassoulis,et al.  Beyond the Wall: Near-Data Processing for Databases , 2015, DaMoN.

[22]  Marcin Zukowski,et al.  MonetDB/X100: Hyper-Pipelining Query Execution , 2005, CIDR.