Metrics for heterogeneous scientific workflows: A case study of an earthquake science application

Scientific workflows are a common computational model for performing scientific simulations. They may include many jobs, many scientific codes, and many file dependencies. Since scientific workflow applications may include both high-performance computing (HPC) and high-throughput computing (HTC) jobs, meaningful performance metrics are difficult to define, as neither traditional HPC metrics nor HTC metrics fully capture the extent of the application. We describe and propose the use of alternative metrics to accurately capture the scale of scientific workflows and quantify their efficiency. In this paper, we present several specific practical scientific workflow performance metrics and discuss these metrics in the context of a large-scale scientific workflow application, the Southern California Earthquake Center CyberShake 1.0 Map calculation. Our metrics reflect both computational performance, such as floating-point operations and file access, and workflow performance, such as job and task scheduling and execution. We break down performance into three levels of granularity: the task, the workflow, and the application levels, presenting a complete view of application performance. We show how our proposed metrics can be used to compare multiple invocations of the same application, as well as executions of heterogeneous applications, quantifying the amount of work performed and the efficiency of the work. Finally, we analyze CyberShake using our proposed metrics to determine potential application optimizations.

[1]  Radu Prodan,et al.  Overhead Analysis of Grid Workflow Applications , 2006, 2006 7th IEEE/ACM International Conference on Grid Computing.

[2]  Alexandru Iosup,et al.  A performance study of grid workflow engines , 2008, 2008 9th IEEE/ACM International Conference on Grid Computing.

[3]  Igor Sfiligoi,et al.  glideinWMS - A generic pilot-based Workload Management System , 2008 .

[4]  Radu Prodan,et al.  Overhead Analysis of Scientific Workflows in Grid Environments , 2008, IEEE Transactions on Parallel and Distributed Systems.

[5]  Thomas H. Jordan,et al.  Strain Green's Tensors, Reciprocity, and Their Applications to Seismic Source and Structure Studies , 2006 .

[6]  Jack Dongarra,et al.  Using PAPI for Hardware Performance Monitoring on Linux Systems , 2001 .

[7]  Rajesh Raman,et al.  Matchmaking: distributed resource management for high throughput computing , 1998, Proceedings. The Seventh International Symposium on High Performance Distributed Computing (Cat. No.98TB100244).

[8]  Ewa Deelman,et al.  Scaling up workflow-based applications , 2010, J. Comput. Syst. Sci..

[9]  Daniel S. Katz,et al.  Pegasus: A framework for mapping complex scientific workflows onto distributed systems , 2005, Sci. Program..

[10]  Keith Beattie,et al.  Reducing Time-to-Solution Using Distributed High-Throughput Mega-Workflows - Experiences from SCEC CyberShake , 2008, 2008 IEEE Fourth International Conference on eScience.

[11]  Radu Prodan,et al.  ON THE CHARACTERISTICS OF GRID WORKFLOWS , 2008 .

[12]  Standardview staff Author's biographies , 1997, STAN.

[13]  Philip J. Maechling,et al.  PHYSICS BASED PROBABILISTIC SEISMIC HAZARD CALCULATIONS FOR SOUTHERN CALIFORNIA , 2007 .

[14]  Brian Tierney,et al.  NetLogger: A Toolkit for Distributed System Performance Tuning and Debugging , 2003, Integrated Network Management.

[15]  Yong Zhao,et al.  Many-task computing for grids and supercomputers , 2008, 2008 Workshop on Many-Task Computing on Grids and Supercomputers.