Towards Efficient MapReduce Using MPI

MapReduce is an emerging programming paradigm for data-parallel applications. We discuss common strategies to implement a MapReduce runtime and propose an optimized implementation on top of MPI. Our implementation combines redistribution and reduce and moves them into the network . This approach especially benefits applications with a limited number of output keys in the map phase. We also show how anticipated MPI-2.2 and MPI-3 features, such as MPI_Reduce_local and nonblocking collective operations, can be used to implement and optimize MapReduce with a performance improvement of up to 25% on 127 cluster nodes. Finally, we discuss additional features that would enable MPI to more efficiently support all MapReduce applications.

[1]  GhemawatSanjay,et al.  The Google file system , 2003 .

[2]  Torsten Hoefler,et al.  Optimizing non-blocking collective operations for infiniband , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[3]  Fabrizio Petrini,et al.  Performance Evaluation of the Quadrics Interconnection Network , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[4]  Jack Dongarra,et al.  Recent Advances in Parallel Virtual Machine and Message Passing Interface, 15th European PVM/MPI Users' Group Meeting, Dublin, Ireland, September 7-10, 2008. Proceedings , 2008, PVM/MPI.

[5]  Andrew Lumsdaine,et al.  Design and implementation of a high-performance MPI for C# and the common language infrastructure , 2008, PPOPP.

[6]  Naga K. Govindaraju,et al.  Mars: A MapReduce Framework on graphics processors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[7]  Christophe Bisciglia,et al.  Cluster computing for web-scale data processing , 2008, SIGCSE '08.

[8]  Karthikeyan Sankaralingam,et al.  MapReduce for the Cell B.E. Architecture , 2007 .

[9]  Amy Nicole Langville,et al.  Google's PageRank and beyond - the science of search engine rankings , 2006 .

[10]  Karthikeyan Sankaralingam,et al.  MapReduce for the Cell Broadband Engine Architecture , 2009, IBM J. Res. Dev..

[11]  Torsten Hoefler,et al.  A Case for Standard Non-blocking Collective Operations , 2007, PVM/MPI.

[12]  Torsten Hoefler,et al.  Message progression in parallel computing - to thread or not to thread? , 2008, 2008 IEEE International Conference on Cluster Computing.

[13]  Torsten Hoefler,et al.  Implementation and performance analysis of non-blocking collective operations for MPI , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[14]  Ralf Lämmel,et al.  Google's MapReduce programming model - Revisited , 2007, Sci. Comput. Program..

[15]  William Gropp,et al.  Fault Tolerance in Message Passing Interface Programs , 2004, Int. J. High Perform. Comput. Appl..

[16]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[17]  Christoforos E. Kozyrakis,et al.  Evaluating MapReduce for Multi-core and Multiprocessor Systems , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[18]  Jack J. Dongarra,et al.  Scalable Fault Tolerant MPI: Extending the Recovery Algorithm , 2005, PVM/MPI.

[19]  Kunle Olukotun,et al.  Map-Reduce for Machine Learning on Multicore , 2006, NIPS.

[20]  Ewing Lusk,et al.  Fault Tolerance in MPI Programs , 2009 .

[21]  George L.-T. Chiu,et al.  Overview of the Blue Gene/L system architecture , 2005, IBM J. Res. Dev..

[22]  Rob Pike,et al.  Interpreting the data: Parallel analysis with Sawzall , 2005, Sci. Program..