Automating environmental computing applications with scientific workflows

Computational environmental science applications have evolved and become more complex over the last decade. In order to cope with the needs of such applications, computational methods and technologies have emerged to support the execution of these applications on heterogeneous, distributed systems. Among them are workflow management systems such as Pegasus. Pegasus is being used by researchers to model seismic wave propagation, to discover new celestial objects, to study RNA critical to human brain development, and to investigate other important research questions. This paper provides an introduction to scientific workflows and describes Pegasus and its main features. The paper highlights how the environmental science community has used Pegasus to automate their scientific workflow executions on high performance and high throughput computing systems by presenting three use cases: two Earth science workflows, and a climate science workflow.

[1]  Brian Bockelman,et al.  Using Xrootd to Federate Regional Storage , 2012 .

[2]  Nathan R. Tallent,et al.  HPCTOOLKIT: tools for performance analysis of optimized parallel programs , 2010, Concurr. Comput. Pract. Exp..

[3]  Douglas Thain,et al.  Makeflow: a portable abstraction for data intensive computing on clusters, clouds, and grids , 2012, SWEET '12.

[4]  Tevfik Kosar,et al.  Conditional workflow management: A survey and analysis , 2007, Sci. Program..

[5]  Wil M. P. van der Aalst,et al.  Workflow Patterns , 2004, Distributed and Parallel Databases.

[6]  Reagan Moore,et al.  iRODS Primer: Integrated Rule-Oriented Data System , 2010, iRODS Primer.

[7]  Malcolm P. Atkinson,et al.  dispel4py: A Python framework for data-intensive scientific computing , 2014, 2014 International Workshop on Data Intensive Scalable Computing Systems.

[8]  Rafael Ferreira da Silva,et al.  Climate Science Performance, Data and Productivity on Titan , 2015 .

[9]  Christopher D. Carothers,et al.  PANORAMA: An approach to performance modeling and diagnosis of extreme-scale workflows , 2017, Int. J. High Perform. Comput. Appl..

[10]  Michael R. Crusoe,et al.  Common Workflow Language , 2015 .

[11]  Malcolm P. Atkinson,et al.  Asterism: Pegasus and Dispel4py Hybrid Workflows for Data-Intensive Science , 2016, 2016 Seventh International Workshop on Data-Intensive Computing in the Clouds (DataCloud).

[12]  Dennis Gannon,et al.  Workflows for e-Science, Scientific Workflows for Grids , 2014 .

[13]  Ewa Deelman,et al.  Pegasus in the Cloud: Science Automation through Workflow Technologies , 2016, IEEE Internet Computing.

[14]  Gabriele Garzoglio,et al.  Open Science Grid , 2011 .

[15]  Miron Livny,et al.  Pegasus, a workflow management system for science automation , 2015, Future Gener. Comput. Syst..

[16]  C. Kesselman,et al.  CyberShake: A Physics-Based Seismic Hazard Model for Southern California , 2011 .

[17]  Ewa Deelman,et al.  Scaling up workflow-based applications , 2010, J. Comput. Syst. Sci..