Grex: An efficient MapReduce framework for graphics processing units

In this paper, we present a new MapReduce framework, called Grex, designed to leverage general purpose graphics processing units (GPUs) for parallel data processing. Grex provides several new features. First, it supports a parallel split method to tokenize input data of variable sizes, such as words in e-books or URLs in web documents, in parallel using GPU threads. Second, Grex evenly distributes data to map/reduce tasks to avoid data partitioning skews. In addition, Grex provides a new memory management scheme to enhance the performance by exploiting the GPU memory hierarchy. Notably, all these capabilities are supported via careful system design without requiring any locks or atomic operations for thread synchronization. The experimental results show that our system is up to 12.4x and 4.1x faster than two state-of-the-art GPU-based MapReduce frameworks for the tested applications.

[1]  M. Balazinska,et al.  A Study of Skew in MapReduce Applications , 2011 .

[2]  Kevin Skadron,et al.  Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[3]  Jimmy J. Lin,et al.  The Curse of Zipf and Limits to Parallelization: An Look at the Stragglers Problem in MapReduce , 2009, LSDS-IR@SIGIR.

[4]  John D. Owens,et al.  GPU Computing , 2008, Proceedings of the IEEE.

[5]  Gagan Agrawal,et al.  Optimizing MapReduce for GPUs with effective shared memory usage , 2012, HPDC '12.

[6]  Feng Ji,et al.  Using Shared Memory to Accelerate MapReduce on Graphics Processing Units , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[7]  Giorgio Valle,et al.  CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment , 2008, BMC Bioinformatics.

[8]  Christoforos E. Kozyrakis,et al.  Phoenix rebirth: Scalable MapReduce on a large-scale shared-memory system , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[9]  Kathryn S. McKinley,et al.  Tile size selection using cache organization and data layout , 1995, PLDI '95.

[10]  Kevin Skadron,et al.  Accelerating SQL database operations on a GPU with CUDA , 2010, GPGPU-3.

[11]  Eitan Grinspun,et al.  Sparse matrix solvers on the GPU: conjugate gradients and multigrid , 2003, SIGGRAPH Courses.

[12]  Tim Güneysu,et al.  Exploiting the Power of GPUs for Asymmetric Cryptography , 2008, CHES.

[13]  Justin Talbot,et al.  Phoenix++: modular MapReduce for shared-memory systems , 2011, MapReduce '11.

[14]  Haibo Chen,et al.  Tiled-MapReduce: Optimizing resource usages of data-parallel applications on multicore with tiling , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[15]  Fabrizio Silvestri,et al.  Workshop on large-scale distributed systems for information retrieval , 2007, SIGF.

[16]  Wenguang Chen,et al.  MapCG: Writing parallel program portable between CPU and GPU , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[17]  Roy H. Campbell,et al.  MITHRA: Multiple data independent tasks on a heterogeneous resource architecture , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[18]  Quan Qian,et al.  The Comparison of the Relative Entropy for Intrusion Detection on CPU and GPU , 2010, 2010 IEEE/ACIS 9th International Conference on Computer and Information Science.

[19]  Mark J. Harris,et al.  Parallel Prefix Sum (Scan) with CUDA , 2011 .

[20]  Marc Snir 2002 International Parallel and Distributed Processing Symposium , 2003 .

[21]  Michael Chu,et al.  Scientific and Engineering Computing Using ATI Stream Technology , 2009, Computing in Science & Engineering.

[22]  Naga K. Govindaraju,et al.  High performance discrete Fourier transforms on graphics processors , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[23]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[24]  Jens H. Krüger,et al.  A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.

[25]  Karthikeyan Sankaralingam,et al.  MapReduce for the Cell B.E. Architecture , 2007 .

[26]  Naga K. Govindaraju,et al.  Mars: A MapReduce Framework on graphics processors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[27]  John D. Owens,et al.  Multi-GPU MapReduce on GPU Clusters , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[28]  Kurt Keutzer,et al.  A map reduce framework for programming graphics processors , 2010 .

[29]  Christoforos E. Kozyrakis,et al.  Evaluating MapReduce for Multi-core and Multiprocessor Systems , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.