Node capability aware resource provisioning in a heterogeneous cloud

Although MapReduce, the core technology of cloud computing, lowers the barriers to enter the parallel computing, it introduces the other challenging research issue of improving its performance via properly resource provisioning. This issue is more complex in a heterogeneous cloud with multiple jobs since the nodes have various capability and workloads. In addition, the limited resources must be shared among all jobs. In this paper, this optimization problem, called Node Capability-aware Provisioning Problem (NCPP), is first formulated as a mathematical model. The purpose of NCPP is to minimize the job execution time which is influenced by node capability. However, NCPP is subject to the resource constraints on the nodes in a cloud. Moreover, the node Capability-Aware Resource Provisioner (CARP) is proposed based on Apache Hadoop to show its feasibility to solve NCPP in a systematic way.

[1]  Roozbeh Farahbod,et al.  Dynamic Resource Allocation in Computing Clouds Using Distributed Multiple Criteria Decision Analysis , 2010, 2010 IEEE 3rd International Conference on Cloud Computing.

[2]  Victor Lee,et al.  Implications of I/O for Gang Scheduled Workloads , 1997, JSSPP.

[3]  Thomas Sandholm,et al.  Dynamic Proportional Share Scheduling in Hadoop , 2010, JSSPP.

[4]  Jeffrey Dean,et al.  Keynote talk: Experiences with MapReduce, an abstraction for large-scale computation , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[5]  GhemawatSanjay,et al.  The Google file system , 2003 .

[6]  Luiz Fernando Bittencourt,et al.  Scheduling service workflows for cost optimization in hybrid clouds , 2010, 2010 International Conference on Network and Service Management.

[7]  Wei-Tsung Su,et al.  An Adaptive Task Allocation Approach for MapReduce in a Heterogeneous Cloud , 2011, 2011 Third International Conference on Computational Intelligence, Communication Systems and Networks.

[8]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[9]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[10]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[11]  Dror G. Feitelson,et al.  Paired Gang Scheduling , 2003, IEEE Trans. Parallel Distributed Syst..

[12]  Mark S. Squillante,et al.  The impact of I/O on program behavior and parallel scheduling , 1998, SIGMETRICS '98/PERFORMANCE '98.

[13]  Mark S. Squillante,et al.  Models of Parallel Applications with Large Computation and I/O Requirements , 2002, IEEE Trans. Software Eng..

[14]  Chao Tian,et al.  A Dynamic MapReduce Scheduler for Heterogeneous Workloads , 2009, 2009 Eighth International Conference on Grid and Cooperative Computing.

[15]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[16]  Isaac D. Scherson,et al.  Resource selection and allocation for dynamic adaptive computing in heterogeneous clusters , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[17]  Nandini Mukherjee,et al.  Optimizing the utilization of virtual resources in Cloud environment , 2010, 2010 IEEE International Conference on Virtual Environments, Human-Computer Interfaces and Measurement Systems.

[18]  Tei-Wei Kuo,et al.  Real-time scheduling of CPU-bound and I/O-bound processes , 1999, Proceedings Sixth International Conference on Real-Time Computing Systems and Applications. RTCSA'99 (Cat. No.PR00306).