High-Throughput Scientific Workflow Scheduling under Deadline Constraint in Clouds

Cloud computing is a paradigm shift in service delivery that promises a leap in efficiency and flexibility in using computing resources. As cloud infrastructures are widely deployed around the globe, many data- and computeintensive scientific workflows have been moved from traditional high- performance computing platforms and grids to clouds. With the rapidly increasing number of cloud users in various science domains, it has become a critical task for the cloud service provider to perform efficient job scheduling while still guaranteeing the workflow completion time as specified in the Service Level Agreement (SLA). Based on practical models for cloud utilization, we formulate a delay-constrained workflow optimization problem to maximize resource utilization for high system throughput and propose a two-step scheduling algorithm to minimize the cloud overhead under a user-specified execution time bound. Extensive simulation results illustrate that the proposed algorithm achieves lower computing overhead or higher resource utilization than existing methods under the execution time bound, and also significantly reduces the total workflow execution time by strategically selecting appropriate mapping nodes for prioritized modules. Index Terms— computing

[1]  Rajkumar Buyya,et al.  Cost-based scheduling of scientific workflow applications on utility grids , 2005, First International Conference on e-Science and Grid Computing (e-Science'05).

[2]  Rajkumar Buyya,et al.  Critical-path and priority based algorithms for scheduling workflows with parameter sweep tasks on global grids , 2005, 17th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'05).

[3]  Shufen Zhang,et al.  Cloud Computing Research and Development Trend , 2010, 2010 Second International Conference on Future Networks.

[4]  Chase Qishi Wu,et al.  Supporting Distributed Application Workflows in Heterogeneous Computing Environments , 2008, 2008 14th IEEE International Conference on Parallel and Distributed Systems.

[5]  Chase Qishi Wu,et al.  A Distributed Workflow Management System with Case Study of Real-life Scientific Applications on Grids , 2011, 30th IEEE International Performance Computing and Communications Conference.

[6]  Richard Wolski,et al.  The Eucalyptus Open-Source Cloud-Computing System , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.

[7]  C. Siva Ram Murthy,et al.  A State-Space Search Approach for Optimizing Reliability and Cost of Execution in Distributed Sensor Networks , 2005, IWDC.

[8]  Umakishore Ramachandran,et al.  Streamline: a scheduling heuristic for streaming applications on the grid , 2006, Electronic Imaging.

[9]  Ewa Deelman,et al.  Experiences using cloud computing for a scientific workflow application , 2011, ScienceCloud '11.

[10]  Ladislau Bölöni,et al.  A Comparison of Eleven Static Heuristics for Mapping a Class of Independent Tasks onto Heterogeneous Distributed Computing Systems , 2001, J. Parallel Distributed Comput..

[11]  Inderveer Chana,et al.  A Survey of Various Workflow Scheduling Algorithms in Cloud Environment , 2011 .

[12]  Chase Qishi Wu,et al.  Latency modeling and minimization for large-scale scientific workflows in distributed network environments , 2011, SpringSim.

[13]  Daniel S. Katz,et al.  Pegasus: A framework for mapping complex scientific workflows onto distributed systems , 2005, Sci. Program..

[14]  Chase Qishi Wu,et al.  A cost-effective scheduling algorithm for scientific workflows in clouds , 2012, 2012 IEEE 31st International Performance Computing and Communications Conference (IPCCC).

[15]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[16]  G. Bruce Berriman,et al.  On the Use of Cloud Computing for Scientific Workflows , 2008, 2008 IEEE Fourth International Conference on eScience.

[17]  Salim Hariri,et al.  Task scheduling algorithms for heterogeneous processors , 1999, Proceedings. Eighth Heterogeneous Computing Workshop (HCW'99).

[18]  Renato J. O. Figueiredo,et al.  A case for grid computing on virtual machines , 2003, 23rd International Conference on Distributed Computing Systems, 2003. Proceedings..

[19]  Borja Sotomayor,et al.  Combining batch execution and leasing using virtual machines , 2008, HPDC '08.

[20]  Ken Kennedy,et al.  Scheduling strategies for mapping application workflows onto the grid , 2005, HPDC-14. Proceedings. 14th IEEE International Symposium on High Performance Distributed Computing, 2005..

[21]  Chase Qishi Wu,et al.  Automation and management of scientific workflows in distributed network environments , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).