Need for speed: CORA scheduler for optimizing completion-times in the cloud

There is an increasing need for cloud service performance that can be tailored to customer requirements. In the context of jobs submitted to cloud computing clusters, a crucial requirement is the specification of job completion-times. A natural way to model this specification, is through client/job utility functions that are dependent on job completion-times. We present a method to allocate and schedule heterogeneous resources to jointly optimize the utilities of jobs in a cloud. Specifically: (i) we formulate a completion-time optimal resource allocation (CORA) problem to apportion cluster resources across the jobs that enforces max-min fairness among job utilities, and (ii) starting with an integer programming problem, we perform a series of steps to transform it into an equivalent linear programming problem, and (iii) we implement the proposed framework as a utility-aware resource scheduler in the widely used Hadoop data processing framework, and finally (iv) through extensive experiments with real-world datasets, we show that our prototype achieves significant performance improvement over existing resource-allocation policies.

[1]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[2]  Mung Chiang,et al.  Autonomous, Collaborative Control for Resilient Cyber Defense (ACCORD) , 2012, 2012 IEEE Sixth International Conference on Self-Adaptive and Self-Organizing Systems Workshops.

[3]  Prashant J. Shenoy,et al.  Dynamic resource allocation for shared data centers using online measurements , 2003, IWQoS'03.

[4]  Gregory A. Koenig,et al.  Time Utility Functions for Modeling and Evaluating Resource Allocations in a Heterogeneous Computing System , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[5]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[6]  Mung Chiang,et al.  Self-Adaptive, Deadline-Aware Resource Control in Cloud Computing , 2013, 2013 IEEE 7th International Conference on Self-Adaptation and Self-Organizing Systems Workshops.

[7]  Donald C. Cox,et al.  An adaptive cross-layer scheduler for improved QoS support of multiclass data services on wireless systems , 2005, IEEE Journal on Selected Areas in Communications.

[8]  Antony I. T. Rowstron,et al.  Better never than late: meeting deadlines in datacenter networks , 2011, SIGCOMM.

[9]  L. S. S. Reddy,et al.  Survey on Improved Scheduling in Hadoop MapReduce in Cloud Environments , 2012, ArXiv.

[10]  R. Srikant,et al.  Scheduling Real-Time Traffic With Deadlines over a Wireless Channel , 1999, WOWMOM '99.

[11]  John Wilkes,et al.  Profitable services in an uncertain world , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[12]  Cristina L. Abad,et al.  Natjam: design and evaluation of eviction policies for supporting priorities and deadlines in mapreduce clusters , 2013, SoCC.

[13]  Daniel A. Menascé,et al.  Resource Allocation for Autonomic Data Centers using Analytic Performance Models , 2005, Second International Conference on Autonomic Computing (ICAC'05).

[14]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[15]  Seyong Lee,et al.  PUMA: Purdue MapReduce Benchmarks Suite , 2012 .

[16]  Andrew V. Goldberg,et al.  Quincy: fair scheduling for distributed computing clusters , 2009, SOSP '09.

[17]  Linus Schrage,et al.  Linear, Integer, and Quadratic Programming with Lindo , 1984 .

[18]  Scott Shenker,et al.  Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling , 2010, EuroSys '10.

[19]  Robert R. Meyer,et al.  A Class of Nonlinear Integer Programs Solvable by a Single Linear Program , 1977 .

[20]  Knud D. Andersen,et al.  The Mosek Interior Point Optimizer for Linear Programming: An Implementation of the Homogeneous Algorithm , 2000 .

[21]  Alvin AuYoung,et al.  Service contracts and aggregate utility functions , 2006, 2006 15th IEEE International Conference on High Performance Distributed Computing.

[22]  Binoy Ravindran,et al.  On recent advances in time/utility function real-time scheduling and resource management , 2005, Eighth IEEE International Symposium on Object-Oriented Real-Time Distributed Computing (ISORC'05).

[23]  David E. Irwin,et al.  Balancing risk and reward in a market-based task service , 2004, Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004..

[24]  2015 IEEE Conference on Computer Communications, INFOCOM 2015, Kowloon, Hong Kong, April 26 - May 1, 2015 , 2015, IEEE Conference on Computer Communications.