Availability and Network-Aware MapReduce Task Scheduling over the Internet

MapReduce offers an ease-of-use programming paradigm for processing large datasets. In our previous work, we have designed a MapReduce framework called BitDew-MapReduce for desktop grid and volunteer computing environment, that allows nonexpert users to run data-intensive MapReduce jobs on top of volunteer resources over the Internet. However, network distance and resource availability have great impact on MapReduce applications running over the Internet. To address this, an availability and network-aware MapReduce framework over the Internet is proposed. Simulation results show that the MapReduce job response time could be decreased by 27.15 %, thanks to Naive Bayes Classifier-based availability prediction and landmark-based network estimation.

[1]  Kyungyong Lee,et al.  MapReduce on opportunistic resources leveraging resource availability , 2012, 4th IEEE International Conference on Cloud Computing Technology and Science Proceedings.

[2]  Gilles Fedak,et al.  Distributed Results Checking for MapReduce in Volunteer Computing , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[3]  Ibrahim Matta,et al.  BRITE: an approach to universal topology generation , 2001, MASCOTS 2001, Proceedings Ninth International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[4]  Bobby Bhattacharjee,et al.  Decentralized, accurate, and low-cost network bandwidth prediction , 2011, 2011 Proceedings IEEE INFOCOM.

[5]  Luís Veiga,et al.  Internet-scale support for map-reduce processing , 2013, Journal of Internet Services and Applications.

[6]  Gilles Fedak,et al.  Parallel Data Processing in Dynamic Hybrid Computing Environment Using MapReduce , 2014, ICA3PP.

[7]  Gilles Fedak,et al.  Towards MapReduce for Desktop Grid Computing , 2010, 2010 International Conference on P2P, Parallel, Grid, Cloud and Internet Computing.

[8]  Gilles Fedak,et al.  Assessing MapReduce for Internet Computing: A Comparison of Hadoop and BitDew-MapReduce , 2012, 2012 ACM/IEEE 13th International Conference on Grid Computing.

[9]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[10]  Gilles Fedak,et al.  Towards efficient data distribution on computational desktop grids with BitTorrent , 2007, Future Gener. Comput. Syst..

[11]  Gilles Fedak,et al.  Optimizing Data Distribution in Desktop Grid Platforms , 2008, Parallel Process. Lett..

[12]  David P. Anderson,et al.  BOINC: a system for public-resource computing and storage , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[13]  Daan Broeder,et al.  A data infrastructure reference model with applications: towards realization of a ScienceTube vision with a data replication service , 2013, Journal of Internet Services and Applications.

[14]  Domenico Talia,et al.  P2P-MapReduce: Parallel data processing in dynamic Cloud environments , 2012, J. Comput. Syst. Sci..

[15]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[16]  Wu-chun Feng,et al.  Reliable MapReduce computing on opportunistic resources , 2011, Cluster Computing.

[17]  Xian-He Sun,et al.  ADAPT: Availability-Aware MapReduce Data Placement for Non-dedicated Distributed Computing , 2012, 2012 IEEE 32nd International Conference on Distributed Computing Systems.

[18]  Alexandru Iosup,et al.  The Failure Trace Archive: Enabling Comparative Analysis of Failures in Diverse Distributed Systems , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[19]  Rajkumar Buyya,et al.  GridSim: a toolkit for the modeling and simulation of distributed resource management and scheduling for Grid computing , 2002, Concurr. Comput. Pract. Exp..

[20]  Luís Veiga,et al.  Large-scale volunteer computing over the Internet , 2012, Journal of Internet Services and Applications.

[21]  Gilles Fedak,et al.  BitDew: A data management and distribution service with multi-protocol file transfer and metadata abstraction , 2009, J. Netw. Comput. Appl..

[22]  Mark Handley,et al.  Topologically-aware overlay construction and server selection , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.