Optimising virtual machine allocation in MapReduce cloud for improved data locality

Big data is getting more attention in today’s world. Although MapReduce is successful in processing big data, it has some performance bottlenecks when deployed in cloud. Data locality has an important role among them. The focus of this paper is on improving data locality in MapReduce cloud by allocating adjacent VMs, for executing MapReduce jobs. Good data locality reduces cross network traffic and hence results in high performance. When a user requests for a set of virtual machines (VMs), VMs are chosen based on their physical distance between other VMs. We propose a greedy algorithm for creating cluster of VMs. Greedy methods do not give an optimal solution. The second method for the allocation of VMs is via partitioning around medoids method. Partitioning around medoids method always find a local minimum. This allocation may not be globally optimised. We also present a dynamic programming approach which is guaranteed to find an optimal solution from the users’ perspective.

[1]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[2]  Beng Chin Ooi,et al.  The performance of MapReduce , 2010, Proc. VLDB Endow..

[3]  IV FrederickA.Matsen,et al.  Minimizing the average distance to a closest leaf in a phylogenetic tree , 2012, Systematic biology.

[4]  T. V. Lakshman,et al.  Optimizing data access latencies in cloud systems by intelligent virtual machine placement , 2013, 2013 Proceedings IEEE INFOCOM.

[5]  Ling Liu,et al.  Cura: A Cost-Optimized Model for MapReduce in a Cloud , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[6]  Amin Vahdat,et al.  A scalable, commodity data center network architecture , 2008, SIGCOMM '08.

[7]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[8]  Ling Liu,et al.  Purlieus: Locality-aware resource allocation for MapReduce in a cloud , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[9]  Hai Jin,et al.  Boosting MapReduce with Network-Aware Task Assignment , 2013, CloudComp.

[10]  Rocco Aversa,et al.  Big data (lost) in the cloud , 2014, Int. J. Big Data Intell..

[11]  Seung Ryoul Maeng,et al.  Locality-aware dynamic VM reconfiguration on MapReduce clouds , 2012, HPDC '12.