Reliable MapReduce computing on opportunistic resources

MapReduce offers an ease-of-use programming paradigm for processing large data sets, making it an attractive model for opportunistic compute resources. However, unlike dedicated resources, where MapReduce has mostly been deployed, opportunistic resources have significantly higher rates of node volatility. As a consequence, the data and task replication scheme adopted by existing MapReduce implementations is woefully inadequate on such volatile resources.In this paper, we propose MOON, short for MapReduce On Opportunistic eNvironments, which is designed to offer reliable MapReduce service for opportunistic computing. MOON adopts a hybrid resource architecture by supplementing opportunistic compute resources with a small set of dedicated resources, and it extends Hadoop, an open-source implementation of MapReduce, with adaptive task and data scheduling algorithms to take advantage of the hybrid resource architecture. Our results on an emulated opportunistic computing system running atop a 60-node cluster demonstrate that MOON can deliver significant performance improvements to Hadoop on volatile compute resources and even finish jobs that are not able to complete in Hadoop.

[1]  Stephen L. Scott,et al.  FreeLoader: Scavenging Desktop Storage Resources for Scientific Data , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[2]  Indranil Gupta,et al.  On Availability of Intermediate Data in Cloud Computations , 2009, HotOS.

[3]  Matei Ripeanu,et al.  Exploring data reliability tradeoffs in replicated storage systems , 2009, HPDC '09.

[4]  Andreas Haeberlen,et al.  Efficient Replica Maintenance for Distributed Storage Systems , 2006, NSDI.

[5]  José A. B. Fortes,et al.  CloudBLAST: Combining MapReduce and Virtualization on Distributed Resources for Bioinformatics Applications , 2008, 2008 IEEE Fourth International Conference on eScience.

[6]  Gilles Fedak,et al.  BitDew: A programmable environment for large-scale data management and distribution , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[7]  Andrew A. Chien,et al.  Henri Casanova , 2022 .

[8]  Douglas Thain,et al.  Distributed computing in practice: the Condor experience , 2005, Concurr. Pract. Exp..

[9]  Peter A. Dinda,et al.  Measuring and understanding user comfort with resource borrowing , 2004, Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004..

[10]  GhemawatSanjay,et al.  The Google file system , 2003 .

[11]  Jacob R. Lorch,et al.  Farsite: federated, available, and reliable storage for an incompletely trusted environment , 2002, OSDI '02.

[12]  Copyright © Intel Corporation 2008 * Other names and brands may be claimed as the property of others , 2004 .

[13]  Ming Zhong,et al.  Replication degree customization for high availability , 2008, Eurosys '08.

[14]  Xiaosong Ma,et al.  Governor: Autonomic Throttling for Aggressive Idle Resource Scavenging , 2005, Second International Conference on Autonomic Computing (ICAC'05).

[15]  S. Habib,et al.  Introducing map-reduce to high end computing , 2008, 2008 3rd Petascale Data Storage Workshop.

[16]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[17]  David P. Anderson,et al.  BOINC: a system for public-resource computing and storage , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[18]  Andreas Haeberlen,et al.  Glacier: highly durable, decentralized storage despite massive correlated failures , 2005, NSDI.

[19]  Andrew A. Chien,et al.  Entropia: architecture and performance of an enterprise desktop grid system , 2003, J. Parallel Distributed Comput..

[20]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.