Performance Prediction Model in Heterogeneous MapReduce Environments

Map Reduce has emerged as a popular computing model for parallel processing of cloud computing. Map Reduce performance analysis and modeling is needed to guide performance optimization and job scheduling. However, we observed that it is difficult to build a performance model due to various aspects of workload behavior and heterogeneity among cluster nodes in heterogeneous Map Reduce Environments. To address the above issues, in this paper, we propose a novel performance prediction model for Map Reduce in heterogeneous environments. This model consists of two components: (1) performance prediction model based on machine learning and (2) optimal parameters selection based on immune algorithm. Experiment results show that our model can accurately forecast the performance of Map Reduce jobs that run in heterogeneous Map Reduce systems.

[1]  Herodotos Herodotou,et al.  Profiling, what-if analysis, and cost-based optimization of MapReduce programs , 2011, Proc. VLDB Endow..

[2]  Chunming Rong,et al.  K-means Clustering in the Cloud -- A Mahout Test , 2011, 2011 IEEE Workshops of International Conference on Advanced Information Networking and Applications.

[3]  Ping-Feng Pai,et al.  Forecasting concentrations of air pollutants by logarithm support vector regression with immune algorithms , 2011, Appl. Math. Comput..

[4]  T. N. Vijaykumar,et al.  Tarazu: optimizing MapReduce on heterogeneous clusters , 2012, ASPLOS XVII.

[5]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[6]  Depei Qian,et al.  Load Balancing in Heterogeneous MapReduce Environments , 2013, 2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing.

[7]  Zhen Xiao,et al.  Improving MapReduce Performance Using Smart Speculative Execution Strategy , 2014, IEEE Transactions on Computers.

[8]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machines , 2002 .

[9]  Cheri A. Levinson,et al.  Profiling , 2012 .

[10]  Depei Qian,et al.  MapReduce Workload Modeling with Statistical Approach , 2011, Journal of Grid Computing.

[11]  Archana Ganapathi,et al.  Statistics-driven workload modeling for the Cloud , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[12]  D. Dasgupta Artificial Immune Systems and Their Applications , 1998, Springer Berlin Heidelberg.

[13]  Michael C. Schatz,et al.  CloudBurst: highly sensitive read mapping with MapReduce , 2009, Bioinform..

[14]  Andrew Warfield,et al.  Xen and the art of virtualization , 2003, SOSP '03.

[15]  R. Shah,et al.  Least Squares Support Vector Machines , 2022 .

[16]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[17]  Lei Yu,et al.  A Hadoop MapReduce Performance Prediction Method , 2013, 2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing.

[18]  Randy H. Katz,et al.  Above the Clouds: A Berkeley View of Cloud Computing , 2009 .

[19]  Thomas Sandholm,et al.  What's inside the Cloud? An architectural map of the Cloud landscape , 2009, 2009 ICSE Workshop on Software Engineering Challenges of Cloud Computing.