Job Dispatching and Scheduling for Heterogeneous Clusters -- A Case Study on the Billing Subsystem of CHT Telecommunication

Many enterprises or institutes are building private clouds within their own data centers. Data centers may have different batches of physical machines due to annual upgrades, but the number of machines is fixed most of the time. Consequently it is crucial to schedule jobs with different resource requirements and characteristics to meet different job timing constraints, in such heterogeneous yet most of the time static environments. This paper describes a cloud resource management framework that dynamically allocates and reallocates computation resources for jobs that have different requirements, including deadline and priority. This framework makes decisions according to specified policies, and the framework provides four default policies for system administrators to choose to fit their specific needs. The framework is designed to be component-pluggable. The components of the framework can be hot-swapped, i.e., Replaced without shutting down the services. In addition, the framework can work as an individual cloud computing system, or as an extension of an existing cloud system. Our experiment results demonstrate that our system is capable of dynamically adjusting the resource allocation plan according to run-time statistics collected. The system also tolerates hardware failures, and will dynamically reallocate workers to compensate for the downtime in order to finish the jobs before deadline. Our experiments also suggest a trade-off between priority and deadline.

[1]  Stephen A. Jarvis,et al.  Dynamic scheduling of parallel real-time jobs by modelling spare capabilities in heterogeneous clusters , 2003, 2003 Proceedings IEEE International Conference on Cluster Computing.

[2]  Xiao Qin,et al.  Security-Aware Resource Allocation for Real-Time Parallel Jobs on Homogeneous and Heterogeneous Clusters , 2008, IEEE Transactions on Parallel and Distributed Systems.

[3]  Nathaniel Palmer,et al.  Workflow Management Coalition , 2009, Encyclopedia of Database Systems.

[4]  Masatoshi Seki dRuby and Rinda: Implementation and Application of Distributed Ruby and its Parallel Coordination Mechanism , 2008, International Journal of Parallel Programming.

[5]  Stephen A. Jarvis,et al.  Dynamic scheduling of parallel jobs with QoS demands in multiclusters and grids , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[6]  Rajkumar Buyya,et al.  Libra: a computational economy‐based job scheduling system for clusters , 2004, Softw. Pract. Exp..

[7]  Arie van Deursen,et al.  A Comparison of Push and Pull Techniques for AJAX , 2007, 2007 9th IEEE International Workshop on Web Site Evolution.

[8]  G. J. Henry,et al.  The UNIX system: The fair share scheduler , 1984, AT&T Bell Laboratories Technical Journal.

[9]  Michael Pinedo,et al.  Scheduling: Theory, Algorithms, and Systems , 1994 .

[10]  Dharma P. Agrawal,et al.  Scheduling of periodic time critical applications for pipelined execution on heterogeneous systems , 2001, International Conference on Parallel Processing, 2001..

[11]  Xiao Qin,et al.  A dynamic and reliability-driven scheduling algorithm for parallel real-time jobs executing on heterogeneous clusters , 2005, J. Parallel Distributed Comput..

[12]  M. Z. Muehlen,et al.  Workflow Management Coalition , 2000 .

[13]  Rajkumar Buyya,et al.  Service Level Agreement based Allocation of Cluster Resources: Handling Penalty to Enhance Utility , 2005, 2005 IEEE International Conference on Cluster Computing.

[14]  Rami G. Melhem,et al.  Optimal reward-based scheduling of periodic real-time tasks , 1999, Proceedings 20th IEEE Real-Time Systems Symposium (Cat. No.99CB37054).

[15]  Xiao Qin,et al.  A New Allocation Scheme for Parallel Applications with Deadline and Security Constraints on Clusters , 2005, 2005 IEEE International Conference on Cluster Computing.

[16]  Pangfeng Liu,et al.  Roystonea: A Cloud Computing System with Pluggable Component Architecture , 2011, 2011 IEEE 17th International Conference on Parallel and Distributed Systems.

[17]  Dhabaleswar K. Panda,et al.  QoPS: A QoS Based Scheme for Parallel Job Scheduling , 2003, JSSPP.