Enabling large-scale scientific workflows on petascale resources using MPI master/worker

Computational scientists often need to execute large, loosely-coupled parallel applications such as workflows and bags of tasks in order to do their research. These applications are typically composed of many, short-running, serial tasks, which frequently demand large amounts of computation and storage. In order to produce results in a reasonable amount of time, scientists would like to execute these applications using petascale resources. In the past this has been a challenge because petascale systems are not designed to execute such workloads efficiently. In this paper we describe a new approach to executing large, fine-grained workflows on distributed petascale systems. Our solution involves partitioning the workflow into independent subgraphs, and then submitting each subgraph as a self-contained MPI job to the available resources (often remote). We describe how the partitioning and job management has been implemented in the Pegasus Workflow Management System. We also explain how this approach provides an end-to-end solution for challenges related to system architecture, queue policies and priorities, and application reuse and development. Finally, we describe how the system is being used to enable the execution of a very large seismic hazard analysis application on XSEDE resources.

[1]  Igor Sfiligoi,et al.  The Pilot Way to Grid Resources Using glideinWMS , 2009, 2009 WRI World Congress on Computer Science and Information Engineering.

[2]  Keith Beattie,et al.  Metrics for heterogeneous scientific workflows: A case study of an earthquake science application , 2011, Int. J. High Perform. Comput. Appl..

[3]  Thomas Hérault,et al.  DAGuE: A Generic Distributed DAG Engine for High Performance Computing , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[4]  Igor Sfiligoi,et al.  Experiences Using GlideinWMS and the Corral Frontend across Cyberinfrastructures , 2011, 2011 IEEE Seventh International Conference on eScience.

[5]  Douglas Thain,et al.  Work Queue + Python: A Framework For Scalable Scientific Ensemble Applications , 2011 .

[6]  Andrei Tsaregorodtsev,et al.  DIRAC pilot framework and the DIRAC Workload Management System , 2010 .

[7]  Bertram Ludäscher,et al.  Kepler: an extensible system for design and execution of scientific workflows , 2004 .

[8]  Radu Prodan,et al.  ASKALON: a tool set for cluster and Grid computing , 2005, Concurr. Pract. Exp..

[9]  Ian T. Foster,et al.  Condor-G: A Computation Management Agent for Multi-Institutional Grids , 2004, Cluster Computing.

[10]  Jesús Labarta,et al.  A dependency-aware task-based programming environment for multi-core architectures , 2008, 2008 IEEE International Conference on Cluster Computing.

[11]  Daniel S. Katz,et al.  Workflow task clustering for best effort systems with Pegasus , 2008, Mardi Gras Conference.

[12]  Ian T. Foster,et al.  The Globus Replica Location Service: Design and Experience , 2009, IEEE Transactions on Parallel and Distributed Systems.

[13]  Hong Linh Truong,et al.  ASKALON: a tool set for cluster and Grid computing: Research Articles , 2005 .

[14]  C. Kesselman,et al.  CyberShake: A Physics-Based Seismic Hazard Model for Southern California , 2011 .

[15]  Rizos Sakellariou,et al.  Advance Reservation Policies for Workflows , 2006, JSSPP.

[16]  Daniel S. Katz,et al.  Pegasus: A framework for mapping complex scientific workflows onto distributed systems , 2005, Sci. Program..

[17]  Edward A. Lee,et al.  CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2000; 00:1–7 Prepared using cpeauth.cls [Version: 2002/09/19 v2.02] Taverna: Lessons in creating , 2022 .

[18]  Yong Zhao,et al.  Falkon: a Fast and Light-weight tasK executiON framework , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[19]  T Maeno,et al.  PanDA: distributed production and distributed analysis system for ATLAS , 2008 .

[20]  Emmanuel Jeannot,et al.  Compact DAG representation and its symbolic scheduling , 1999, J. Parallel Distributed Comput..

[21]  Ian J. Taylor,et al.  Visual Grid Workflow in Triana , 2005, Journal of Grid Computing.

[22]  Jeff T. Linderoth,et al.  An enabling framework for master-worker applications on the Computational Grid , 2000, Proceedings the Ninth International Symposium on High-Performance Distributed Computing.

[23]  J.T. Moscicki DIANE - distributed analysis environment for GRID-enabled simulation and analysis of physics data , 2003, 2003 IEEE Nuclear Science Symposium. Conference Record (IEEE Cat. No.03CH37515).

[24]  Ewa Deelman,et al.  Experiences with resource provisioning for scientific workflows using Corral , 2010, Sci. Program..

[25]  Cédric Augonnet,et al.  StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..

[26]  Cédric Augonnet,et al.  StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..