The future of scientific workflows

Today’s computational, experimental, and observational sciences rely on computations that involve many related tasks. The success of a scientific mission often hinges on the computer automation of these workflows. In April 2015, the US Department of Energy (DOE) invited a diverse group of domain and computer scientists from national laboratories supported by the Office of Science, the National Nuclear Security Administration, from industry, and from academia to review the workflow requirements of DOE’s science and national security missions, to assess the current state of the art in science workflows, to understand the impact of emerging extreme-scale computing systems on those workflows, and to develop requirements for automated workflow management in future and existing environments. This article is a summary of the opinions of over 50 leading researchers attending this workshop. We highlight use cases, computing systems, workflow needs and conclude by summarizing the remaining challenges this community sees that inhibit large-scale scientific workflows from becoming a mainstream tool for extreme-scale science.

[1]  Frank Leymann,et al.  Web Services Platform Architecture: SOAP, WSDL, WS-Policy, WS-Addressing, WS-BPEL, WS-Reliable Messaging, and More , 2005 .

[2]  Bertram Ludäscher,et al.  Modeling and Querying Scientific Workflow Provenance in the D-OPM , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[3]  Ian Foster,et al.  WGL – A Workflow Generator Language and Utility , 2013 .

[4]  Karsten Schwan,et al.  Flexpath: Type-Based Publish/Subscribe System for Large-Scale Science Analytics , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[5]  Shiyong Lu,et al.  Storing, reasoning, and querying OPM-compliant scientific workflow provenance using relational databases , 2011, Future Gener. Comput. Syst..

[6]  Ganesh Gopalakrishnan,et al.  Determinism and Reproducibility in Large-Scale HPC Systems , 2013 .

[7]  Brian Tarran,et al.  A failure of prediction? , 2016 .

[8]  Scott Klasky,et al.  Moving the Code to the Data - Dynamic Code Deployment Using ActiveSpaces , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[9]  Torsten Hoefler,et al.  Designing Bit-Reproducible Portable High-Performance Applications , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[10]  Marta Mattoso,et al.  A Provenance-based Adaptive Scheduling Heuristic for Parallel Scientific Workflows in Clouds , 2012, Journal of Grid Computing.

[11]  Irwin D. Kuntz,et al.  Development and validation of a modular, extensible docking program: DOCK 5 , 2006, J. Comput. Aided Mol. Des..

[12]  Schahram Dustdar,et al.  Performance metrics and ontologies for Grid workflows , 2007, Future Gener. Comput. Syst..

[13]  Yolanda Gil,et al.  Pegasus: Mapping Scientific Workflows onto the Grid , 2004, European Across Grids Conference.

[14]  E.R. Mark,et al.  Enhancements to the eXtensible Data Model and Format (XDMF) , 2007, 2007 DoD High Performance Computing Modernization Program Users Group Conference.

[15]  Scott Klasky,et al.  Understanding I/O Performance Using I/O Skeletal Applications , 2012, Euro-Par.

[16]  Scott Klasky,et al.  ADIOS Visualization Schema: A First Step Towards Improving Interdisciplinary Collaboration in High Performance Computing , 2013, 2013 IEEE 9th International Conference on e-Science.

[17]  Pavan Balaji,et al.  On the Reproducibility of MPI Reduction Operations , 2013, 2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing.

[18]  Al Shaffer 2009 High Performance Computing Modernization Program Users Group Conference , 2009 .

[19]  Yogesh L. Simmhan,et al.  The Trident Scientific Workflow Workbench , 2008, 2008 IEEE Fourth International Conference on eScience.

[20]  Manish Parashar,et al.  Flexible Scheduling and Control of Bandwidth and In-transit Services for End-to-End Application Workflows , 2014, 2014 Fourth International Workshop on Network-Aware Data Management.

[21]  Carole A. Goble,et al.  The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud , 2013, Nucleic Acids Res..

[22]  Karsten Schwan,et al.  I/O Containers: Managing the Data Analytics and Visualization Pipelines of High End Codes , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[23]  Philip Saponaro,et al.  Improving numerical reproducibility and stability in large-scale numerical simulations on GPUs , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[24]  Daniel S. Katz,et al.  Pegasus: A framework for mapping complex scientific workflows onto distributed systems , 2005, Sci. Program..

[25]  Michael Stonebraker,et al.  The Architecture of SciDB , 2011, SSDBM.

[26]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[27]  Judy Qiu,et al.  A Tale of Two Data-Intensive Paradigms: Applications, Abstractions, and Architectures , 2014, 2014 IEEE International Congress on Big Data.

[28]  Justin M. Wozniak,et al.  Lessons Learned from Building In Situ Coupling Frameworks , 2015, ISAV@SC.

[29]  Michael A. Heroux,et al.  Toward Local Failure Local Recovery Resilience Model using MPI-ULFM , 2014, EuroMPI/ASIA.

[30]  Keita Teranishi,et al.  Extreme-Scale Viability of Collective Communication for Resilient Task Scheduling and Work Stealing , 2014, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[31]  Lavanya Ramakrishnan,et al.  Combining Workflow Templates with a Shared Space-Based Execution Model , 2014, 2014 9th Workshop on Workflows in Support of Large-Scale Science.

[32]  Cláudio T. Silva,et al.  Querying and re-using workflows with VsTrails , 2008, SIGMOD Conference.

[33]  John L. Gustafson,et al.  The End of Error: Unum Computing , 2015 .

[34]  Karsten Schwan,et al.  PreDatA – preparatory data analytics on peta-scale machines , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[35]  Daniel S. Katz,et al.  Swift: A language for distributed parallel scripting , 2011, Parallel Comput..

[36]  Suresh Narayanan,et al.  Effective End-to-end Management of Data Acquisition and Analysis for X-ray Photon Correlation Spectroscopy , 2013 .

[37]  Jason Maassen,et al.  Programming Scientific and Distributed Workflow with Triana Services , 2004 .

[38]  Ian Taylor,et al.  Programming scientific and distributed workflow with Triana services: Research Articles , 2006 .

[39]  H. L.,et al.  Van Nostrand's Scientific Encyclopedia , 1938, Nature.

[40]  Bertram Ludäscher,et al.  Kepler: an extensible system for design and execution of scientific workflows , 2004 .

[41]  Karsten Schwan,et al.  Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS) , 2008, CLADE '08.

[42]  James Demmel,et al.  Parallel Reproducible Summation , 2015, IEEE Transactions on Computers.

[43]  Sriram Krishnamoorthy,et al.  Enabling Structured Exploration of Workflow Performance Variability in Extreme-Scale Environments , 2015 .

[44]  Matthieu Dreher,et al.  Bredala: Semantic Data Redistribution for In Situ Applications , 2016, 2016 IEEE International Conference on Cluster Computing (CLUSTER).

[45]  Miron Livny,et al.  Pegasus, a workflow management system for science automation , 2015, Future Gener. Comput. Syst..

[46]  Rick Cattell,et al.  Scalable SQL and NoSQL data stores , 2011, SGMD.

[47]  Fan Zhang,et al.  Combining in-situ and in-transit processing to enable extreme-scale scientific analysis , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[48]  Kenneth D. Moreland The Future of Scientific Workflows. Report of the DOE NGNS/CS Scientific Workflows Workshop (Sandia contributions) , 2015 .

[49]  Daniel S. Katz,et al.  Montage: a grid portal and software toolkit for science-grade astronomical image mosaicking , 2009, Int. J. Comput. Sci. Eng..

[50]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[51]  Daniel J. Blankenberg,et al.  Galaxy: a platform for interactive large-scale genome analysis. , 2005, Genome research.

[52]  Alex,et al.  VizSchema - a Unified Visualization of Computational Accelerator Physics Data , 2010 .

[53]  Zhao Zhang,et al.  Parallel Scripting for Applications at the Petascale and Beyond , 2009, Computer.

[54]  Hal Finkel,et al.  HACC: Simulating Sky Surveys on State-of-the-Art Supercomputing Architectures , 2014, 1410.2805.

[55]  C.R. Johnson,et al.  SCIRun: A Scientific Programming Environment for Computational Steering , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[56]  Viraj N. Bhat,et al.  Autonomic management of data streaming and in-transit processing for data intensive scientific workflows , 2008 .

[57]  Yolanda Gil,et al.  Enhancing reproducibility for computational methods , 2016, Science.

[58]  Douglas Thain,et al.  Weaver: integrating distributed computing abstractions into scientific workflows using Python , 2010, HPDC '10.

[59]  William Kahan,et al.  Pracniques: further remarks on reducing truncation errors , 1965, CACM.

[60]  Edward A. Lee,et al.  CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2000; 00:1–7 Prepared using cpeauth.cls [Version: 2002/09/19 v2.02] Taverna: Lessons in creating , 2022 .

[61]  David R. Mathog,et al.  Parallel BLAST on split databases , 2003, Bioinform..

[62]  David H. Bailey,et al.  High-precision floating-point arithmetic in scientific computation , 2004, Computing in Science & Engineering.

[63]  Ewa Deelman,et al.  Failure prediction and localization in large scientific workflows , 2011, WORKS '11.

[64]  Karsten Schwan,et al.  DataStager: scalable data staging services for petascale applications , 2009, HPDC '09.

[65]  Robert B. Ross,et al.  High-Performance Parallel I/O , 2006, PVM/MPI.

[66]  Karsten Schwan,et al.  In-situ I/O processing: a case for location flexibility , 2011, PDSW '11.

[67]  Scott Klasky,et al.  Exploring Automatic, Online Failure Recovery for Scientific Applications at Extreme Scales , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[68]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[69]  Mark Greenwood,et al.  Taverna: lessons in creating a workflow environment for the life sciences: Research Articles , 2006 .

[70]  Cláudio T. Silva,et al.  Provenance for Visualizations: Reproducibility and Beyond , 2007, Computing in Science & Engineering.

[71]  Geoffrey C. Fox,et al.  MapReduce for Data Intensive Scientific Analyses , 2008, 2008 IEEE Fourth International Conference on eScience.

[72]  Daniel S. Katz,et al.  Using Application Skeletons to Improve eScience Infrastructure , 2014, 2014 IEEE 10th International Conference on e-Science.

[73]  Yogesh L. Simmhan,et al.  The Open Provenance Model core specification (v1.1) , 2011, Future Gener. Comput. Syst..

[74]  Franck Cappello,et al.  Fault Tolerance in Petascale/ Exascale Systems: Current Knowledge, Challenges and Research Opportunities , 2009, Int. J. High Perform. Comput. Appl..

[75]  Cláudio T. Silva,et al.  VisTrails: enabling interactive multiple-view visualizations , 2005, VIS 05. IEEE Visualization, 2005..

[76]  Ann L. Chervenak,et al.  Characterizing and profiling scientific workflows , 2013, Future Gener. Comput. Syst..

[77]  OEG-DIA Towards Open Publication of Reusable Scientific Workflows : Abstractions , Standards and Linked Data , 2012 .

[78]  Andrew Thall Extended-precision floating-point numbers for GPU computation , 2006, SIGGRAPH '06.

[79]  Daniel S. Katz,et al.  Reusability in Science: From Initial User Engagement to Dissemination of Results , 2013, ArXiv.

[80]  Michela Taufer,et al.  On the Need for Reproducible Numerical Accuracy through Intelligent Runtime Selection of Reduction Algorithms at the Extreme Scale , 2015, 2015 IEEE International Conference on Cluster Computing.

[81]  Andrey Gubarev,et al.  Dremel : Interactive Analysis of Web-Scale Datasets , 2011 .

[82]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[83]  Ken Martin,et al.  Time Dependent Processing in a Parallel Pipeline Architecture , 2007, IEEE Transactions on Visualization and Computer Graphics.

[84]  Nathan D. Price,et al.  Likelihood-Based Gene Annotations for Gap Filling and Quality Assessment in Genome-Scale Metabolic Models , 2014, PLoS Comput. Biol..

[85]  Gregor von Laszewski,et al.  Swift: Fast, Reliable, Loosely Coupled Parallel Computation , 2007, 2007 IEEE Congress on Services (Services 2007).

[86]  Lars Koesterke,et al.  PerfExpert: An Easy-to-Use Performance Diagnosis Tool for HPC Applications , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[87]  Ralph Bergmann,et al.  Similarity assessment and efficient retrieval of semantic workflows , 2014, Inf. Syst..

[88]  David H. Laidlaw,et al.  The application visualization system: a computational environment for scientific visualization , 1989, IEEE Computer Graphics and Applications.

[89]  Bohn Stafleu van Loghum,et al.  Online … , 2002, LOG IN.

[90]  Li Zhao,et al.  SCEC CyberShake Workflows - Automating Probabilistic Seismic Hazard Analysis Calculations , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[91]  Rajkumar Buyya,et al.  Multiobjective differential evolution for scheduling workflow applications on global Grids , 2009, Concurr. Comput. Pract. Exp..

[92]  C. Kesselman,et al.  CyberShake: A Physics-Based Seismic Hazard Model for Southern California , 2011 .

[93]  Scott Klasky,et al.  Experiments with in-transit processing for data intensive grid workflows , 2007, 2007 8th IEEE/ACM International Conference on Grid Computing.

[94]  Valerie Hendrix,et al.  Experiences with User-Centered Design for the Tigres Workflow API , 2014, 2014 IEEE 10th International Conference on e-Science.

[95]  Margo I. Seltzer,et al.  Layering in Provenance Systems , 2009, USENIX Annual Technical Conference.

[96]  Scott Klasky,et al.  DataSpaces: an interaction and coordination framework for coupled simulation workflows , 2012, HPDC '10.