Fixed-function hardware sorting accelerators for near data MapReduce execution

A large fraction of MapReduce execution time is spent processing the Map phase, and a large fraction of Map phase execution time is spent sorting the intermediate key-value pairs generated by the Map function. Sorting accelerators can achieve high performance and low power because they lack the overheads of sorting implementations on general purpose hardware, such as instruction fetch and decode. We find that sorting accelerators are a good match for 3D-stacked Near Data Processing (NDP) because their sorting throughput is so high that it saturates the memory bandwidth available in other memory organizations. The increased sorting performance and low power requirement of fixed-function hardware lead to very high Map phase performance and energy efficiency, reducing Map phase execution time by up to 92%, and reducing energy consumption by up to 91%. We further find that sorting accelerators in a less exotic form of NDP outperform more expensive forms of 3D-stacked NDP without accelerators. We also implement the accelerator on an FPGA to validate our claims.

[1]  Seth H. Pugsley,et al.  USIMM : the Utah SImulated Memory Module , 2012 .

[2]  Feifei Li,et al.  NDC: Analyzing the impact of 3D-stacked memory+logic devices on MapReduce workloads , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[3]  Babak Falsafi,et al.  Meet the walkers accelerating index traversals for in-memory databases , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[4]  Kenneth A. Ross,et al.  Q100: the architecture and design of a database processing unit , 2014, ASPLOS.

[5]  J. Jeddeloh,et al.  Hybrid memory cube new DRAM architecture increases density and performance , 2012, 2012 Symposium on VLSI Technology (VLSIT).

[6]  Thomas F. Wenisch,et al.  Disaggregated memory for expansion and sharing in blade servers , 2009, ISCA '09.

[7]  Ronald G. Dreslinski,et al.  Integrated 3D-stacked server designs for increasing physical density of key-value stores , 2014, ASPLOS.

[8]  Jung Ho Ahn,et al.  NDA: Near-DRAM acceleration architecture leveraging commodity DRAM devices and standard memory modules , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[9]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[10]  Feifei Li,et al.  Comparing Implementations of Near-Data Computing with In-Memory MapReduce Workloads , 2014, IEEE Micro.

[11]  Kenneth A. Ross,et al.  Navigating big data with high-throughput, energy-efficient data partitioning , 2013, ISCA.

[12]  Kiyoung Choi,et al.  A scalable processing-in-memory accelerator for parallel graph processing , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[13]  Steven Swanson,et al.  Near-Data Processing: Insights from a MICRO-46 Workshop , 2014, IEEE Micro.