A Learning Architecture for Scheduling Workflow Applications in the Cloud

The scheduling of workflow applications involves the mapping of individual workflow tasks to computational resources, based on a range of functional and non-functional quality of service requirements. Workflow applications such as scientific workflows often require extensive computational processing and generate significant amounts of experimental data. The emergence of cloud computing has introduced a utility-type market model, where computational resources of varying capacities can be procured on demand, in a pay-per-use fashion. In workflow based applications dependencies exist amongst tasks which requires the generation of schedules in accordance with defined precedence constraints. These constraints pose a difficult planning problem, where tasks must be scheduled for execution only once all their parent tasks have completed. In general the two most important objectives of workflow schedulers are the minimisation of both cost and make span. The cost of workflow execution consists of both computational costs incurred from processing individual tasks, and data transmission costs. With scientific workflows potentially large amounts of data must be transferred between compute and storage sites. This paper proposes a novel cloud workflow scheduling approach which employs a Markov Decision Process to optimally guide the workflow execution process depending on environmental state. In addition the system employs a genetic algorithm to evolve workflow schedules. The overall architecture is presented, and initial results indicate the potential of this approach for developing viable workflow schedules on the Cloud.

[1]  Jeffrey D. Ullman,et al.  NP-Complete Scheduling Problems , 1975, J. Comput. Syst. Sci..

[2]  Warren Smith,et al.  Predicting Application Run Times Using Historical Information , 1998, JSSPP.

[3]  Graham R. Nudd,et al.  Pace—A Toolset for the Performance Prediction of Parallel and Distributed Systems , 2000, Int. J. High Perform. Comput. Appl..

[4]  Matthew R. Pocock,et al.  Taverna: a tool for the composition and enactment of bioinformatics workflows , 2004, Bioinform..

[5]  Prashant Doshi,et al.  Dynamic workflow composition using Markov decision processes , 2004, Proceedings. IEEE International Conference on Web Services, 2004..

[6]  Chisu Wu,et al.  Genetic-algorithm-based real-time task scheduling with multiple goals , 2004, J. Syst. Softw..

[7]  Rajkumar Buyya,et al.  Cost-based scheduling of scientific workflow applications on utility grids , 2005, First International Conference on e-Science and Grid Computing (e-Science'05).

[8]  Daniel S. Katz,et al.  Pegasus: A framework for mapping complex scientific workflows onto distributed systems , 2005, Sci. Program..

[9]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[10]  Rajkumar Buyya,et al.  Scheduling scientific workflow applications with deadline and budget constraints using genetic algorithms , 2006, Sci. Program..

[11]  Edward A. Lee,et al.  Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..

[12]  Rajarshi Das,et al.  On the use of hybrid reinforcement learning for autonomic resource allocation , 2007, Cluster Computing.

[13]  David Vengerov,et al.  A Reinforcement Learning Approach to Dynamic Resource Allocation ∗ , 2005 .

[14]  Jerome A. Rolia,et al.  Workload Analysis and Demand Prediction of Enterprise Data Center Applications , 2007, 2007 IEEE 10th International Symposium on Workload Characterization.

[15]  Rajkumar Buyya,et al.  Multi-objective planning for workflow execution on Grids , 2007, 2007 8th IEEE/ACM International Conference on Grid Computing.

[16]  Balázs Kégl,et al.  Grid Differentiated Services: A Reinforcement Learning Approach , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).

[17]  M. Livny,et al.  The cost of doing science on the cloud: The Montage example , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[18]  Rajkumar Buyya,et al.  High-Performance Cloud Computing: A View of Scientific Applications , 2009, 2009 10th International Symposium on Pervasive Systems, Algorithms, and Networks.

[19]  Rajkumar Buyya,et al.  CloudSim: A Novel Framework for Modeling and Simulation of Cloud Computing Infrastructures and Services , 2009, ArXiv.

[20]  Rajkumar Buyya,et al.  A Particle Swarm Optimization-Based Heuristic for Scheduling Workflow Applications in Cloud Computing Environments , 2010, 2010 24th IEEE International Conference on Advanced Information Networking and Applications.

[21]  Ewa Deelman,et al.  Scientific workflows and clouds , 2010, ACM Crossroads.

[22]  Xiao Liu,et al.  A market-oriented hierarchical scheduling strategy in cloud workflow systems , 2011, The Journal of Supercomputing.