A dynamic block device reconfiguration algorithm in virtual MapReduce cluster

With the advances of cloud computing and virtualization technologies, running MapReduce applications over clouds has been attracting more and more attention in recent years. However, as a fundamental problem, the performance of MapReduce applications can sometimes be severely degraded due to the overheads from I/O virtualization and resource competitions among virtual machines. In this paper, we propose a dynamic block device reconfiguration algorithm in virtual MapReduce clusters, which reduces the data transfer time between virtual machines and thereby improving the performance of MapReduce applications on top of the clouds. The proposed algorithm utilizes a block device reconfiguration scheme, where a block device attached to a virtual machine can be dynamically detached and reattached to other virtual machines at runtime. This scheme allows us to move files easily across different virtual machines without any network transfers between virtual machines. This algorithm is also dynamic in a sense that it estimates the total data transfer times between virtual machines using multiple regression analysis based on CPU utilization and data size, and adaptively determines a least-cost data transfer path between a mapper virtual machine and a reducer virtual machine. We have implemented our algorithm in Hadoop MapReduce. The benchmarking results showed that the overheads incurred by transferring data from mapper virtual machines to reducer virtual machines are minimized and the execution times of MapReduce applications are shortened up to 14 %.

[1]  Liang Dong,et al.  Starfish: A Self-tuning System for Big Data Analytics , 2011, CIDR.

[2]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[3]  Jennifer M. Schopf,et al.  Using Regression Techniques to Predict Large Data Transfers , 2003, Int. J. High Perform. Comput. Appl..

[4]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[5]  Ludmila Cherkasova,et al.  Measuring CPU Overhead for I/O Processing in the Xen Virtual Machine Monitor , 2005, USENIX ATC, General Track.

[6]  Thomas Sandholm,et al.  MapReduce optimization using regulated dynamic prioritization , 2009, SIGMETRICS '09.

[7]  José A. B. Fortes,et al.  CloudBLAST: Combining MapReduce and Virtualization on Distributed Resources for Bioinformatics Applications , 2008, 2008 IEEE Fourth International Conference on eScience.

[8]  Hai Jin,et al.  Evaluating MapReduce on Virtual Machines: The Hadoop Case , 2009, CloudCom.

[9]  Leon S. Lasdon,et al.  Design and Testing of a Generalized Reduced Gradient Code for Nonlinear Programming , 1978, TOMS.

[10]  Jun Fang,et al.  Evaluating I/O Scheduler in Virtual Machines for Mapreduce Application , 2010, 2010 Ninth International Conference on Grid and Cloud Computing.

[11]  Radu Sion,et al.  Enhancement of Xen's scheduler for MapReduce workloads , 2011, HPDC '11.

[12]  Hai Jin,et al.  CLOUDLET: towards mapreduce implementation on virtual machines , 2009, HPDC '09.

[13]  Willy Zwaenepoel,et al.  Diagnosing performance overheads in the xen virtual machine environment , 2005, VEE '05.

[14]  Jin-Soo Kim,et al.  Inter-domain socket communications supporting high performance and full binary compatibility on Xen , 2008, VEE '08.

[15]  Seung Ryoul Maeng,et al.  Locality-aware dynamic VM reconfiguration on MapReduce clouds , 2012, HPDC '12.

[16]  Ravi Kumar,et al.  Pig latin: a not-so-foreign language for data processing , 2008, SIGMOD Conference.

[17]  Hai Jin,et al.  Adaptive Disk I/O Scheduling for MapReduce in Virtualized Environment , 2011, 2011 International Conference on Parallel Processing.

[18]  Guangwen Yang,et al.  Location-Aware MapReduce in Virtual Cloud , 2011, 2011 International Conference on Parallel Processing.

[19]  Ling Liu,et al.  Purlieus: Locality-aware resource allocation for MapReduce in a cloud , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[20]  S. M. Shahidehpour,et al.  Bidding Strategies Using Price Based Unit Commitment in a Deregulated Power Market , 2004 .

[21]  H J Motulsky,et al.  Fitting curves to data using nonlinear regression: a practical and nonmathematical review , 1987, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.