Experiences with resource provisioning for scientific workflows using Corral

The development of grid and workflow technologies has enabled complex, loosely coupled scientific applications to be executed on distributed resources. Many of these applications consist of large numbers of short-duration tasks whose runtimes are heavily influenced by delays in the execution environment. Such applications often perform poorly on the grid because of the large scheduling overheads commonly found in grids. In this paper we present a provisioning system based on multi-level scheduling that improves workflow runtime by reducing scheduling overheads. The system reserves resources for the exclusive use of the application, and gives applications control over scheduling policies. We describe our experiences with the system when running a suite of real workflow-based applications including in astronomy, earthquake science, and genomics. Provisioning resources with Corral ahead of the workflow execution, reduced the runtime of the astronomy application by up to 78% (45% on average) and of a genome mapping application by an order of magnitude when compared to traditional methods. We also show how provisioning can benefit applications both on a small local cluster as well as a large-scale campus resource.

[1]  Ian T. Foster,et al.  The Globus Replica Location Service: Design and Experience , 2009, IEEE Transactions on Parallel and Distributed Systems.

[2]  Ken Kennedy,et al.  TaskScheduling Strategies forWorkflow-based Applications inGrids , 2005 .

[3]  Ewa Deelman,et al.  Resource Provisioning Options for Large-Scale Scientific Workflows , 2008, 2008 IEEE Fourth International Conference on eScience.

[4]  Ian T. Foster,et al.  Condor-G: A Computation Management Agent for Multi-Institutional Grids , 2004, Cluster Computing.

[5]  Igor Sfiligoi,et al.  glideinWMS - A generic pilot-based Workload Management System , 2008 .

[6]  Miron Livny,et al.  Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[7]  Carl Kesselman,et al.  Enabling personal clusters on demand for batch resources using commodity software , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[8]  G. Bruce Berriman,et al.  Scientific workflow applications on Amazon EC2 , 2010, 2009 5th IEEE International Conference on E-Science Workshops.

[9]  Edward Walker,et al.  Creating personal adaptive clusters for managing scientific jobs in a distributed computing environment , 2006, 2006 IEEE Challenges of Large Applications in Distributed Environments.

[10]  Richard Wolski,et al.  VARQ: virtual advance reservations for queues , 2008, HPDC '08.

[11]  Keith Beattie,et al.  Reducing Time-to-Solution Using Distributed High-Throughput Mega-Workflows - Experiences from SCEC CyberShake , 2008, 2008 IEEE Fourth International Conference on eScience.

[12]  Ewa Deelman,et al.  Grids and Clouds: Making Workflow Applications Work in Heterogeneous Distributed Environments , 2010, Int. J. High Perform. Comput. Appl..

[13]  Li Zhao,et al.  Managing Large-Scale Workflow Execution from Resource Provisioning to Provenance Tracking: The CyberShake Example , 2006, 2006 Second IEEE International Conference on e-Science and Grid Computing (e-Science'06).

[14]  Daniel S. Katz,et al.  Workflow task clustering for best effort systems with Pegasus , 2008, Mardi Gras Conference.

[15]  Carl Kesselman,et al.  Optimizing Grid-Based Workflow Execution , 2005, Journal of Grid Computing.

[16]  R. F. Freund,et al.  Dynamic matching and scheduling of a class of independent tasks onto heterogeneous computing systems , 1999, Proceedings. Eighth Heterogeneous Computing Workshop (HCW'99).

[17]  Daniel S. Katz,et al.  Pegasus: A framework for mapping complex scientific workflows onto distributed systems , 2005, Sci. Program..

[18]  Yong Zhao,et al.  Falkon: a Fast and Light-weight tasK executiON framework , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[19]  Daniel S. Katz,et al.  A comparison of two methods for building astronomical image mosaics on a grid , 2005, 2005 International Conference on Parallel Processing Workshops (ICPPW'05).

[20]  Salim Hariri,et al.  Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing , 2002, IEEE Trans. Parallel Distributed Syst..

[21]  J.T. Moscicki DIANE - distributed analysis environment for GRID-enabled simulation and analysis of physics data , 2003, 2003 IEEE Nuclear Science Symposium. Conference Record (IEEE Cat. No.03CH37515).

[22]  Francine Berman,et al.  Heuristics for scheduling parameter sweep applications in grid environments , 2000, Proceedings 9th Heterogeneous Computing Workshop (HCW 2000) (Cat. No.PR00556).

[23]  Carl Kesselman,et al.  GriPhyN and LIGO, building a virtual data Grid for gravitational wave scientists , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[24]  Roy Fielding,et al.  Architectural Styles and the Design of Network-based Software Architectures"; Doctoral dissertation , 2000 .

[25]  Dennis Gannon,et al.  Workflows for e-Science, Scientific Workflows for Grids , 2014 .

[26]  Carl Kesselman,et al.  Performance and scalability of a replica location service , 2004, Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004..

[27]  Wolfgang Gentzsch,et al.  Sun Grid Engine: towards creating a compute power grid , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.