A New Approach to the Cloud-Based Heterogeneous MapReduce Placement Problem

Guaranteeing quality of service (QoS) with minimum computation cost is the most important objective of cloud-based MapReduce computations. Minimizing the total computation cost of cloud-based MapReduce computations is done through MapReduce placement optimization. MapReduce placement optimization approaches can be classified into two categories: homogeneous MapReduce placement optimization and heterogeneous MapReduce placement optimization. It is generally believed that heterogeneous MapReduce placement optimization is more effective than homogeneous MapReduce placement optimization in reducing the total running cost of cloud-based MapReduce computations. This paper proposes a new approach to the heterogeneous MapReduce placement optimization problem. In this new approach, the heterogeneous MapReduce placement optimization problem is transformed into a constrained combinatorial optimization problem and is solved by an innovative constructive algorithm. Experimental results show that the running cost of the cloud-based MapReduce computation platform using this new approach is 24.3-44.0 percent lower than that using the most popular homogeneous MapReduce placement approach, and 2.0-36.2 percent lower than that using the heterogeneous MapReduce placement approach not considering the spare resources from the existing MapReduce computations. The experimental results have also demonstrated the good scalability of this new approach.

[1]  Maolin Tang,et al.  A More Efficient and Effective Heuristic Algorithm for the MapReduce Placement Problem in Cloud Computing , 2014, 2014 IEEE 7th International Conference on Cloud Computing.

[2]  Keke Chen,et al.  CRESP: Towards Optimal Resource Provisioning for MapReduce Computing in Public Clouds , 2014, IEEE Transactions on Parallel and Distributed Systems.

[3]  Herodotos Herodotou,et al.  No one (cluster) size fits all: automatic cluster sizing for data-intensive analytics , 2011, SoCC.

[4]  Qing He,et al.  Parallel K-Means Clustering Based on MapReduce , 2009, CloudCom.

[5]  Jordi Torres,et al.  Resource-Aware Adaptive Scheduling for MapReduce Clusters , 2011, Middleware.

[6]  Paolo Toth,et al.  A Set-Covering-Based Heuristic Approach for Bin-Packing Problems , 2006, INFORMS J. Comput..

[7]  Kemafor Anyanwu,et al.  Scheduling Hadoop Jobs to Meet Deadlines , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[8]  Palden Lama,et al.  AROMA: automated resource allocation and configuration of mapreduce environment in the cloud , 2012, ICAC '12.

[9]  Bo Yang,et al.  Automatic task slots assignment in Hadoop MapReduce , 2011, ASBD '11.

[10]  Herodotos Herodotou,et al.  Profiling, what-if analysis, and cost-based optimization of MapReduce programs , 2011, Proc. VLDB Endow..

[11]  Sungsoo Park,et al.  Algorithms for the variable sized bin packing problem , 2003, Eur. J. Oper. Res..

[12]  Bhavani M. Thuraisingham,et al.  Data Intensive Query Processing for Large RDF Graphs Using Cloud Computing Tools , 2010, 2010 IEEE 3rd International Conference on Cloud Computing.

[13]  Feng Zhao,et al.  Energy aware consolidation for cloud computing , 2008, CLUSTER 2008.

[14]  Laurence A. Wolsey,et al.  Mixed Integer Programming , 2008, Wiley Encyclopedia of Computer Science and Engineering.

[15]  Roy H. Campbell,et al.  ARIA: automatic resource inference and allocation for mapreduce environments , 2011, ICAC '11.

[16]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[17]  Keke Chen,et al.  Towards Optimal Resource Provisioning for Running MapReduce Programs in Public Clouds , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[18]  Wu-chun Feng,et al.  MOON: MapReduce On Opportunistic eNvironments , 2010, HPDC '10.

[19]  Mehdi Serairi,et al.  Heuristics for the variable sized bin-packing problem , 2009, Comput. Oper. Res..

[20]  Abhishek Chandra,et al.  Exploiting Spatio-temporal Tradeoffs for Energy-Aware MapReduce in the Cloud , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[21]  Jie Huang,et al.  The HiBench benchmark suite: Characterization of the MapReduce-based data analysis , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[22]  Kun-Lung Wu,et al.  FLEX: A Slot Allocation Scheduling Optimizer for MapReduce Workloads , 2010, Middleware.

[23]  Akshat Verma,et al.  pMapper: Power and Migration Cost Aware Application Placement in Virtualized Systems , 2008, Middleware.

[24]  Maolin Tang,et al.  A Hybrid Genetic Algorithm for the Energy-Efficient Virtual Machine Placement Problem in Data Centers , 2014, Neural Processing Letters.

[25]  Vasileios Pappas,et al.  Improving the Scalability of Data Center Networks with Traffic-aware Virtual Machine Placement , 2010, 2010 Proceedings IEEE INFOCOM.

[26]  Ling Liu,et al.  Purlieus: Locality-aware resource allocation for MapReduce in a cloud , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[27]  Jordi Torres,et al.  Deadline-Based MapReduce Workload Management , 2013, IEEE Transactions on Network and Service Management.

[28]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[29]  Manish Parashar,et al.  Accelerating MapReduce Analytics Using CometCloud , 2012, 2012 IEEE Fifth International Conference on Cloud Computing.

[30]  Fuzhen Zhuang,et al.  Parallel extreme learning machine for regression based on MapReduce , 2013, Neurocomputing.

[31]  Jing Xu,et al.  Multi-Objective Virtual Machine Placement in Virtualized Data Center Environments , 2010, 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing.

[32]  Harald Dyckhoff,et al.  A typology of cutting and packing problems , 1990 .

[33]  Wei Li,et al.  Energy-Efficient Virtual Machine Placement in Data Centers by Genetic Algorithm , 2012, ICONIP.

[34]  Ling Liu,et al.  Cost-Effective Resource Provisioning for MapReduce in a Cloud , 2015, IEEE Transactions on Parallel and Distributed Systems.

[35]  Kyong Hoon Kim,et al.  Minimizing Cost of Virtual Machines for Deadline-Constrained MapReduce Applications in the Cloud , 2012, 2012 ACM/IEEE 13th International Conference on Grid Computing.

[36]  José A. B. Fortes,et al.  CloudBLAST: Combining MapReduce and Virtualization on Distributed Resources for Bioinformatics Applications , 2008, 2008 IEEE Fourth International Conference on eScience.

[37]  Rajkumar Buyya,et al.  Energy-aware resource allocation heuristics for efficient management of data centers for Cloud computing , 2012, Future Gener. Comput. Syst..

[38]  Rina Panigrahy,et al.  Validating Heuristics for Virtual Machines Consolidation , 2011 .