Towards MapReduce for Desktop Grid Computing

MapReduce is an emerging programming model for data-intense application proposed by Google, which has attracted a lot of attention recently. MapReduce borrows from functional programming, where programmer defines Map and Reduce tasks executed on large set of distributed data. In this paper we propose an implementation of the MapReduce programming model. We present the architecture of the prototype based on Bit Dew, a middleware for large scale data management on Desktop Grid. We describe the set of features which makes our approach suitable for large scale and loosely connected Internet Desktop Grid: massive fault tolerance, replica management, barriers-free execution, latency-hiding optimisation as well as distributed result checking. We also present performance evaluation of the prototype both against micro-benchmarks and real MapReduce application. The scalability test shows that we achieve linear speedup on the classical Word Count benchmark. Several scenarios involving lagger hosts and host crashes demonstrate that the prototype is able to cope with an experimental context similar to real-world Internet.

[1]  Nazareno Andrade,et al.  Labs of the World, Unite!!! , 2006, Journal of Grid Computing.

[2]  Naga K. Govindaraju,et al.  Mars: A MapReduce Framework on graphics processors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[3]  David P. Anderson,et al.  BOINC: a system for public-resource computing and storage , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[4]  Gilles Fedak,et al.  The Computational and Storage Potential of Volunteer Computing , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).

[5]  Gilles Fedak,et al.  XtremWeb: a generic global computing system , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[6]  GhemawatSanjay,et al.  The Google file system , 2003 .

[7]  Bingsheng He,et al.  Mars: Accelerating MapReduce with Graphics Processors , 2011, IEEE Transactions on Parallel and Distributed Systems.

[8]  Gilles Fedak,et al.  Characterizing Result Errors in Internet Desktop Grids , 2007, Euro-Par.

[9]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[10]  Eugenio Cesario,et al.  Distributed Data Mining using a Public Resource Computing Framework , 2009, CoreGRID@Euro-Par.

[11]  Wu-chun Feng,et al.  MOON: MapReduce On Opportunistic eNvironments , 2010, HPDC '10.

[12]  Ian Taylor,et al.  Bridging the Data Management Gap Between Service and Desktop Grids , 2008 .

[13]  Gabriel Antoniu,et al.  Enabling High Data Throughput in Desktop Grids through Decentralized Data and Metadata Management: The BlobSeer Approach , 2009, Euro-Par.

[14]  Gilles Fedak,et al.  Optimizing Data Distribution in Desktop Grid Platforms , 2008, Parallel Process. Lett..

[15]  Andrew A. Chien,et al.  Resource Management for Rapid Application Turnaround on Enterprise Desktop Grids , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[16]  Luis F. G. Sarmenta,et al.  Sabotage-tolerance mechanisms for volunteer computing systems , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[17]  Miron Livny,et al.  Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[18]  Gilles Fedak,et al.  BitDew: A data management and distribution service with multi-protocol file transfer and metadata abstraction , 2009, J. Netw. Comput. Appl..

[19]  Torsten Hoefler,et al.  Towards Efficient MapReduce Using MPI , 2009, PVM/MPI.

[20]  Domenico Talia,et al.  A scalable super-peer approach for public scientific computation , 2009, Future Gener. Comput. Syst..

[21]  Rajeev Gandhi,et al.  An Analysis of Traces from a Production MapReduce Cluster , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[22]  Franck Cappello,et al.  Grid'5000: A Large Scale And Highly Reconfigurable Experimental Grid Testbed , 2006, Int. J. High Perform. Comput. Appl..

[23]  Gilles Fedak,et al.  Towards efficient data distribution on computational desktop grids with BitTorrent , 2007, Future Gener. Comput. Syst..

[24]  Christoforos E. Kozyrakis,et al.  Evaluating MapReduce for Multi-core and Multiprocessor Systems , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.