Automation of Network-Based Scientific Workflows

Comprehensive, end-to-end, data and workflow management solutions are needed to handle the increasing complexity of processes and data volumes associated with modern distributed scientific problem solving, such as ultrascale simulations and high-throughput experiments. The key to the solution is an integrated network-based framework that is functional, dependable, faulttolerant, and supports data and process provenance. Such a framework needs to make development and use of application workflows dramatically easier so that scientists’ efforts can shift away from data management and utility software development to scientific research and discovery. An integrated view of these activities is provided by the notion of scientific workflows - a series of structured activities and computations that arise in scientific problem-solving. An information technology framework that supports scientific workflows is the Ptolemy II based environment called Kepler. This paper discusses the issues associated with practical automation of scientific processes and workflows and illustrates this with workflows developed using the Kepler framework and tools.

[1]  Juliana Freire,et al.  Tackling the Provenance Challenge one layer at a time , 2008, Concurr. Comput. Pract. Exp..

[2]  Ilkay Altintas,et al.  Provenance Collection Support in the Kepler Scientific Workflow System , 2006, IPAW.

[3]  A. Iyengar,et al.  An analysis of Web server performance , 1997, GLOBECOM 97. IEEE Global Telecommunications Conference. Conference Record.

[4]  Mladen A. Vouk,et al.  Quality of service and scientific workflows , 1996, Quality of Numerical Software.

[5]  Carole A. Goble,et al.  Guest editors' introduction to the special section on scientific workflows , 2005, SGMD.

[6]  Ricardo Jiménez-Peris,et al.  WS-replication: a framework for highly available web services , 2006, WWW '06.

[7]  Daewon W. Byun,et al.  The next generation of integrated air quality modeling: EPA's models-3 , 1996 .

[8]  PlaleBeth,et al.  A survey of data provenance in e-science , 2005 .

[9]  Kishor S. Trivedi,et al.  A comprehensive model for software rejuvenation , 2005, IEEE Transactions on Dependable and Secure Computing.

[10]  Yong Zhao,et al.  Chimera: a virtual data system for representing, querying, and automating data derivation , 2002, Proceedings 14th International Conference on Scientific and Statistical Database Management.

[11]  Fabio Casati,et al.  Specification and implementation of exceptions in workflow management systems , 1999, TODS.

[12]  Daniel Atkins,et al.  Revolutionizing Science and Engineering Through Cyberinfrastructure: Report of the National Science Foundation Blue-Ribbon Advisory Panel on Cyberinfrastructure , 2003 .

[13]  George K. Thiruvathukal,et al.  A Virtual Computing Laboratory , 2008, Computing in Science & Engineering.

[14]  John D. Musa,et al.  Operational profiles in software-reliability engineering , 1993, IEEE Software.

[15]  Cláudio T. Silva,et al.  VisTrails: enabling interactive multiple-view visualizations , 2005, VIS 05. IEEE Visualization, 2005..

[16]  Yennun Huang,et al.  Software rejuvenation: analysis, module and applications , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[17]  Mladen A. Vouk,et al.  An Approach to the Modeling and Analysis of Software Production Processes , 1995 .

[18]  Jean Arlat,et al.  Definition and analysis of hardware- and software-fault-tolerant architectures , 1990, Computer.

[19]  D MusaJohn Operational Profiles in Software-Reliability Engineering , 1993 .

[20]  Richard P. Mount The Office of Science Data-Management Challenge , 2005 .

[21]  Yogesh L. Simmhan,et al.  A survey of data provenance in e-science , 2005, SGMD.

[22]  John R. Rice,et al.  Enabling Technologies for Computational Science , 2000 .

[23]  Edward A. Lee,et al.  Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..

[24]  Cesare Pautasso,et al.  BioOpera: cluster-aware computing , 2002, Proceedings. IEEE International Conference on Cluster Computing.

[25]  Michael Luck,et al.  A Protocol for Recording Provenance in Service-Oriented Grids , 2004, OPODIS.

[26]  Gustavo Alonso,et al.  Flexible exception handling in the OPERA process support system , 1998, Proceedings. 18th International Conference on Distributed Computing Systems (Cat. No.98CB36183).

[27]  Amit Sheth NSF Workshop on Workflow and Process Automation in Information Systems: State-of-the-Art and Future Directions , 1997, SIGG.

[28]  Ivica Crnkovic,et al.  Building Reliable Component-Based Software Systems , 2002 .

[29]  Cláudio T. Silva,et al.  Managing Rapidly-Evolving Scientific Workflows , 2006, IPAW.

[30]  Calton Pu,et al.  A modeling and execution environment for distributed scientific workflows , 2003, 15th International Conference on Scientific and Statistical Database Management, 2003..

[31]  Amit P. Sheth,et al.  An overview of workflow management: From process modeling to infrastructure for automation , 1995 .

[32]  Mladen A. Vouk,et al.  Workflow and End-User Quality of Service Issues in Web-Based Education , 1999, IEEE Trans. Knowl. Data Eng..

[33]  Harry G. Perros,et al.  Performance of Network-Based Problem-Solving Environments , 2000 .

[34]  Amit P. Sheth,et al.  An overview of workflow management: From process modeling to workflow automation infrastructure , 1995, Distributed and Parallel Databases.

[35]  Luc Moreau,et al.  Provenance of e-Science Experiments - Experience from Bioinformatics , 2003 .