Memory system characterization of big data workloads

Two recent trends that have emerged include (1) Rapid growth in big data technologies with new types of computing models to handle unstructured data, such as map-reduce and noSQL (2) A growing focus on the memory subsystem for performance and power optimizations, particularly with emerging memory technologies offering different characteristics from conventional DRAM (bandwidths, read/write asymmetries). This paper examines how these trends may intersect by characterizing the memory access patterns of various Hadoop and noSQL big data workloads. Using memory DIMM traces collected using special hardware, we analyze the spatial and temporal reference patterns to bring out several insights related to memory and platform usages, such as memory footprints, read-write ratios, bandwidths, latencies, etc. We develop an analysis methodology to understand how conventional optimizations such as caching, prediction, and prefetching may apply to these workloads, and discuss the implications on software and system design.

[1]  Song Jiang,et al.  Characterizing Facebook's Memcached Workload , 2014, IEEE Internet Computing.

[2]  D. Boyd,et al.  Six Provocations for Big Data , 2011 .

[3]  Michael Stonebraker,et al.  SQL databases v. NoSQL databases , 2010, CACM.

[4]  Siegfried Selberherr,et al.  Emerging memory technologies: Trends, challenges, and modeling methods , 2012, Microelectron. Reliab..

[5]  Erik Hagersten,et al.  Memory Characterization of the ECperf Benchmark , 2003 .

[6]  Depei Qian,et al.  Statistics-based Workload Modeling for MapReduce , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[7]  J. Manyika Big data: The next frontier for innovation, competition, and productivity , 2011 .

[8]  Chunjie Luo,et al.  Characterizing data analysis workloads in data centers , 2013, 2013 IEEE International Symposium on Workload Characterization (IISWC).

[9]  Song Jiang,et al.  Workload analysis of a large-scale key-value store , 2012, SIGMETRICS '12.

[10]  Joseph Issa,et al.  Hadoop and memcached: Performance and power characterization and analysis , 2012, Journal of Cloud Computing: Advances, Systems and Applications.

[11]  J. Manyika,et al.  Are you ready for the era of ‘big data’? , 2010 .

[12]  John Byrne,et al.  Workload diversity and dynamics in big data analytics: implications to system designers , 2012, ASBD '12.

[13]  Luiz André Barroso,et al.  Memory system characterization of commercial workloads , 1998, ISCA.

[14]  Zhizhong Tang,et al.  Memory Performance Characterization of SPEC CPU2006 Benchmarks Using TSIM , 2012 .

[15]  Yuan Xie Future memory and interconnect technologies , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[16]  Michael M. Swift,et al.  Efficient virtual memory for big memory servers , 2013, ISCA.

[17]  H. Peter Hofstee,et al.  Understanding System and Architecture for Big Data , 2012 .

[18]  Mircea R. Stan,et al.  Advances and Future Prospects of Spin-Transfer Torque Random Access Memory , 2010, IEEE Transactions on Magnetics.

[19]  Yuan Xie,et al.  Modeling, Architecture, and Applications for Emerging Memory Technologies , 2011, IEEE Design & Test of Computers.

[20]  Jin-Soo Kim,et al.  Memory characterization of a parallel data mining workload , 1998, Workload Characterization: Methodology and Case Studies. Based on the First Workshop on Workload Characterization.

[21]  Jie Huang,et al.  The HiBench benchmark suite: Characterization of the MapReduce-based data analysis , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[22]  Weisong Shi,et al.  Workload characterization on a production Hadoop cluster: A case study on Taobao , 2012, 2012 IEEE International Symposium on Workload Characterization (IISWC).

[23]  K. Bakshi,et al.  Considerations for big data: Architecture and approach , 2012, 2012 IEEE Aerospace Conference.

[24]  David M. Brooks,et al.  ISA-independent workload characterization and its implications for specialized architectures , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[25]  A. Jaleel Memory Characterization of Workloads Using Instrumentation-Driven Simulation A Pin-based Memory Characterization of the SPEC CPU 2000 and SPEC CPU 2006 Benchmark Suites , 2022 .

[26]  H. Howie Huang,et al.  Energy-aware writes to non-volatile main memory , 2011, OPSR.