Scheduling Parallel Tasks onto Opportunistically Available Cloud Resources

We consider the problem of opportunistically scheduling low-priority tasks onto underutilized computation resources in the cloud left by high-priority tasks. To avoid conflicts with high-priority tasks, the scheduler must suspend the low-priority tasks (causing waiting), or move them to other underutilized servers (causing migration), if the high-priority tasks resume. The goal of opportunistic scheduling is to schedule the low-priority tasks onto intermittently available server resources while minimizing the combined cost of waiting and migration. Moreover, we aim to support multiple parallel low-priority tasks with synchronization constraints. Under the assumption that servers' availability to low-priority tasks can be modeled as ON/OFF Markov chains, we have shown that the optimal solution requires solving a Markov Decision Process (MDP) that has exponential complexity, and efficient solutions are known only in the case of homogeneously behaving servers. In this paper, we propose an efficient heuristic scheduling policy by formulating the problem as restless Multi-Armed Bandits (MAB) under relaxed synchronization. We prove the index ability of the problem and provide closed-form formulas to compute the indices. Our evaluation using real data center traces shows that the performance result closely matches the prediction by the Markov chain model, and the proposed index policy achieves consistently good performance under various server dynamics compared with the existing policies.

[1]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[2]  José Niño-Mora,et al.  Computing an index policy for bandits with switching penalties , 2007, ValueTools '07.

[3]  Bruno Gaujal,et al.  A mean field model of work stealing in large-scale systems , 2010, SIGMETRICS '10.

[4]  Xiaohong Jiang,et al.  Live Migration of Multiple Virtual Machines with Resource Reservation in Cloud Computing Environments , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[5]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing - "ABSTRACT" , 2009, PODC '09.

[6]  José Niño Mora Restless Bandits, Partial Conservation Laws and Indexability , 2000 .

[7]  Shinji Kikuchi,et al.  Performance Modeling of Concurrent Live Migration Operations in Cloud Computing Systems Using PRISM Probabilistic Model Checker , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[8]  P. Whittle Restless Bandits: Activity Allocation in a Changing World , 1988 .

[9]  Julien Gossa,et al.  Cost-Wait Trade-Offs in Client-Side Resource Provisioning with Elastic Clouds , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[10]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[11]  Jeffrey K. Hollingsworth,et al.  Unobtrusiveness and efficiency in idle cycle stealing for PC grids , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[12]  Stephen Dawson,et al.  Markovian Workload Characterization for QoS Prediction in the Cloud , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[13]  Eytan Modiano,et al.  Scheduling in parallel queues with randomly varying connectivity and switchover delay , 2010, 2011 Proceedings IEEE INFOCOM.

[14]  Yogish Sabharwal,et al.  Varying bandwidth resource allocation problem with bag constraints , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[15]  Lang Tong,et al.  To migrate or to wait: Bandwidth-latency tradeoff in opportunistic scheduling of parallel tasks , 2012, 2012 Proceedings IEEE INFOCOM.

[16]  Jean-Marc Vincent,et al.  Discovering Statistical Models of Availability in Large Distributed Systems: An Empirical Study of SETI@home , 2011, IEEE Transactions on Parallel and Distributed Systems.

[17]  DE Economist A SURVEY ON THE BANDIT PROBLEM WITH SWITCHING COSTS , 2004 .

[18]  Alan Burns,et al.  Real Time Scheduling Theory: A Historical Perspective , 2004, Real-Time Systems.