A time-energy performance analysis of MapReduce on heterogeneous systems with GPUs

Motivated by the explosion of Big Data analytics, performance improvements in low-power (wimpy) systems and the increasing energy efficiency of GPUs, this paper presents a timeenergy performance analysis of MapReduce on heterogeneous systems with GPUs. We evaluate the time and energy performance of three MapReduce applications with diverse resource demands on a HadoopCUDA framework. As executing these applications on heterogeneous systems with GPUs is challenging, we introduce a novel lazy processing technique which requires no modifications to the underlying Hadoop framework. To analyze the impact of heterogeneity, we compare the heterogeneous CPU+GPU with the homogeneous CPU-only execution across three systems with diverse characteristics, (i) a traditional high-performance (brawny) Intel i7 system hosting a discrete 640-core Nvidia GPU of the latest Maxwell generation, (ii) a wimpy platform consisting of a quad-core ARM Cortex-A9 hosting the same discrete Maxwell GPU, and (iii) a wimpy platform integrating four ARM Cortex-A15 cores and 192 Nvidia Kepler GPU cores on the same chip. These systems encompass both intra-node heterogeneity with discrete GPUs and intra-chip heterogeneity with integrated GPUs. Our measurement-based performance analysis highlights the following results. For compute-intensive workloads, the brawny heterogeneous system achieves speedups of up to 2.3 and reduces the energy usage by almost half compared to the brawny homogeneous system. As expected, for applications where data transfers dominate the execution time, heterogeneity exhibits worse timeenergy performance compared to homogeneous systems. For such applications, the heterogeneous wimpy A9 system with discrete GPU uses around 14 times the energy of homogeneous A9 system due to both system resource imbalances and high power overhead of the discrete GPU. However, comparing among heterogeneous systems, the wimpy A15 with integrated GPU uses the lowest energy across all workloads. This allows us to establish an execution time equivalence ratio between a single brawny node and multiple wimpy nodes. Based on this equivalence ratio, the wimpy nodes exhibit energy savings of two-thirds while maintaining the same execution time. This result advocates the potential usage of heterogeneous wimpy systems with integrated GPUs for Big Data analytics.

[1]  Henri E. Bal,et al.  Scaling MapReduce Vertically and Horizontally , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[2]  Karsten Schwan,et al.  Brawny vs. Wimpy: Evaluation and Analysis of Modern Workloads on Heterogeneous Processors , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[3]  Yong Meng Teo,et al.  Modeling the Energy Efficiency of Heterogeneous Clusters , 2014, 2014 43rd International Conference on Parallel Processing.

[4]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[5]  Beng Chin Ooi,et al.  A Performance Study of Big Data on Small Nodes , 2015, Proc. VLDB Endow..

[6]  John D. Owens,et al.  Multi-GPU MapReduce on GPU Clusters , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[7]  Christoforos E. Kozyrakis,et al.  Evaluating MapReduce for Multi-core and Multiprocessor Systems , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[8]  Feng Ji,et al.  Using Shared Memory to Accelerate MapReduce on Graphics Processing Units , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[9]  Beng Chin Ooi,et al.  The performance of MapReduce , 2010, Proc. VLDB Endow..

[10]  Wenguang Chen,et al.  MapCG: Writing parallel program portable between CPU and GPU , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[11]  Gagan Agrawal,et al.  Accelerating MapReduce on a coupled CPU-GPU architecture , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[12]  Roy H. Campbell,et al.  MITHRA: Multiple data independent tasks on a heterogeneous resource architecture , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[13]  Roy H. Campbell,et al.  Profiling and evaluating hardware choices for MapReduce environments: An application-aware approach , 2014, Perform. Evaluation.

[14]  Christoforos E. Kozyrakis,et al.  On the energy (in)efficiency of Hadoop clusters , 2010, OPSR.

[15]  Collin McCurdy,et al.  The Scalable Heterogeneous Computing (SHOC) benchmark suite , 2010, GPGPU-3.

[16]  Lavanya Ramakrishnan,et al.  On the performance and energy efficiency of Hadoop deployment models , 2013, 2013 IEEE International Conference on Big Data.

[17]  Satoshi Matsuoka,et al.  Hybrid Map Task Scheduling for GPU-Based Heterogeneous Clusters , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[18]  Yon Dohn Chung,et al.  Parallel data processing with MapReduce: a survey , 2012, SGMD.

[19]  Gagan Agrawal,et al.  Optimizing MapReduce for GPUs with effective shared memory usage , 2012, HPDC '12.

[20]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[21]  Naga K. Govindaraju,et al.  Mars: A MapReduce Framework on graphics processors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[22]  Yanpei Chen,et al.  Energy efficiency for large-scale MapReduce workloads with significant interactive analysis , 2012, EuroSys '12.

[23]  Vivek Sarkar,et al.  HadoopCL: MapReduce on Distributed Heterogeneous Platforms through Seamless Integration of Hadoop and OpenCL , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[24]  Jignesh M. Patel,et al.  Energy management for MapReduce clusters , 2010, Proc. VLDB Endow..