Analysing Hadoop performance in a multi-user IaaS Cloud

Over the last few years, Big Data analysis (i.e., crunching enormous amounts of data from different sources to extract useful knowledge for improving business objectives) has attracted huge attention from enterprises and research institutions. One of the most successful paradigms that has gained popularity in order to analyse this huge amount of data, is MapReduce (and particularly Hadoop, its open source implementation). However, Hadoop-based applications require massive amounts of resources in order to conduct different analysis of large amounts of data. This growing requirements that research and enterprises demand from the actual computing infrastructures empowers the Cloud computing utilization, where there is an increasing demand of Hadoop as a Service. Since Hadoop requires a distributed environment in order to operate, a significant problem is where resources are located. Focusing in Cloud environments, this problem lays mainly on the criteria for Virtual Machine (VM) placement. The work presented in this paper focuses on the analysis of performance, power consumption and resource usage by Hadoop applications when deploying Hadoop on Virtual Clusters (VCs) within a private IaaS Cloud. More precisely, the impact of different VM placement strategies on Hadoop-based application performance, power consumption and resource usage is measured. As a result, some conclusions on the optimal criteria for VM deployment are provided.

[1]  Palden Lama,et al.  AROMA: automated resource allocation and configuration of mapreduce environment in the cloud , 2012, ICAC '12.

[2]  Hussein M. Alnuweiri,et al.  Resource allocation and scheduling in cloud computing , 2012, 2012 International Conference on Computing, Networking and Communications (ICNC).

[3]  Omer F. Rana,et al.  Scaling Archived Social Media Data Analysis Using a Hadoop Cloud , 2013, 2013 IEEE Sixth International Conference on Cloud Computing.

[4]  Jeffrey M. Galloway,et al.  On the Performance of Apache Hadoop in a Tiny Private IaaS Cloud , 2013, 2013 10th International Conference on Information Technology: New Generations.

[5]  Jungkyu Han,et al.  Design and performance evaluation for Hadoop clusters on virtualized environment , 2013, The International Conference on Information Networking 2013 (ICOIN).

[6]  María Blanca Caminero,et al.  Characterising the Power Consumption of Hadoop Clouds - A Social Media Analysis Case Study , 2013, CLOSER.

[7]  José A. B. Fortes,et al.  CloudBLAST: Combining MapReduce and Virtualization on Distributed Resources for Bioinformatics Applications , 2008, 2008 IEEE Fourth International Conference on eScience.

[8]  Hai Jin,et al.  Evaluating MapReduce on Virtual Machines: The Hadoop Case , 2009, CloudCom.

[9]  Stratis D. Viglas,et al.  SAND Join — A skew handling join algorithm for Google's MapReduce framework , 2011, 2011 IEEE 14th International Multitopic Conference.

[10]  Seung Ryoul Maeng,et al.  Locality-aware dynamic VM reconfiguration on MapReduce clouds , 2012, HPDC '12.

[11]  Lavanya Ramakrishnan,et al.  On the performance and energy efficiency of Hadoop deployment models , 2013, 2013 IEEE International Conference on Big Data.