Scaling up MapReduce-based Big Data Processing on Multi-GPU systems

MapReduce is a popular data-parallel processing model encompassed with recent advances in computing technology and has been widely exploited for large-scale data analysis. The high demand on MapReduce has stimulated the investigation of MapReduce implementations with different architectural models and computing paradigms, such as multi-core clusters, Clouds, Cubieboards and GPUs. Particularly, current GPU-based MapReduce approaches mainly focus on single-GPU algorithms and cannot handle large data sets, due to the limited GPU memory capacity. Based on the previous multi-GPU MapReduce version MGMR, this paper proposes an upgrade version MGMR++ to eliminate GPU memory limitation and a pipelined version, PMGMR, to handle the Big Data challenge through both CPU memory and hard disks. MGMR++ is extended from MGMR with flexible C++ templates and CPU memory utilization, while PMGMR fine-tuned the performance through the latest GPU features such as streams and Hyper-Q as well as hard disk utilization. Compared to MGMR (Jiang et al., Cluster Computing 2013), the proposed schemes achieve about 2.5-fold performance improvement, increase system scalability, and allow programmers to write straightforward MapReduce code for Big Data.

[1]  Tong Liu,et al.  The development of Mellanox/NVIDIA GPUDirect over InfiniBand—a new model for GPU to GPU communications , 2011, Computer Science - Research and Development.

[2]  Madhusudhan Govindaraju,et al.  MARLA: MapReduce for Heterogeneous Clusters , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[3]  Rajat Raina,et al.  Large-scale deep unsupervised learning using graphics processors , 2009, ICML '09.

[4]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[5]  Johan A. K. Suykens,et al.  Optimized Data Fusion for Kernel k-Means Clustering , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Jonathan Schaeffer,et al.  On the Versatility of Parallel Sorting by Regular Sampling , 1993, Parallel Comput..

[7]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[8]  Gagan Agrawal,et al.  Optimizing MapReduce for GPUs with effective shared memory usage , 2012, HPDC '12.

[9]  Hai Jiang,et al.  MGMR: Multi-GPU Based MapReduce , 2013, GPC.

[10]  John D. Owens,et al.  Multi-GPU MapReduce on GPU Clusters , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[11]  Tomohiro Kudoh,et al.  Stream processing with BigData: SSS-MapReduce , 2012, 4th IEEE International Conference on Cloud Computing Technology and Science Proceedings.

[12]  Arthur W. Toga,et al.  CUDA optimization strategies for compute- and memory-bound neuroimaging algorithms , 2012, Comput. Methods Programs Biomed..

[13]  Ian T. Foster,et al.  Grid information services for distributed resource sharing , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.

[14]  Bartosz Przydatek A Fast Approximation Algorithm for the Subset‐sum Problem , 2002 .

[15]  Bingsheng He,et al.  Mars: Accelerating MapReduce with Graphics Processors , 2011, IEEE Transactions on Parallel and Distributed Systems.

[16]  Hai Jiang,et al.  Accelerating MapReduce framework on multi-GPU systems , 2013, Cluster Computing.

[17]  Kazuhiro Seki,et al.  Parallel distributed trajectory pattern mining using MapReduce , 2012, 4th IEEE International Conference on Cloud Computing Technology and Science Proceedings.

[18]  Gagan Agrawal,et al.  Accelerating MapReduce on a coupled CPU-GPU architecture , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[19]  Wu-chun Feng,et al.  StreamMR: An Optimized MapReduce Framework for AMD GPUs , 2011, 2011 IEEE 17th International Conference on Parallel and Distributed Systems.

[20]  R. V. van Nieuwpoort,et al.  The Grid 2: Blueprint for a New Computing Infrastructure , 2003 .

[21]  Nathan Bell,et al.  Thrust: A Productivity-Oriented Library for CUDA , 2012 .

[22]  Feng Ji,et al.  Using Shared Memory to Accelerate MapReduce on Graphics Processing Units , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.