User-steering of HPC workflows: state-of-the-art and future directions

In 2006 a group of leading researchers was gathered to discuss several challenges to scientific workflow supporting technologies and many of which still remain open challenges, such as the steering of workflows by users. Due to big data and long lasting workflows, many users demand steering features such as real-time monitoring, analysis and specially execution interference. The workflow execution should respond dynamically to such interference in the execution, to support the experimentation process in high performance computing. This paper revisits the issues in the user steering and dynamic workflows, presenting the state-of-the-art in it, and the open challenges. Our goal is to discuss research issues related to scientists' steering and present some ideas on how these demands may be supported in current scientific workflow technologies.

[1]  Ami Marowka,et al.  The GRID: Blueprint for a New Computing Infrastructure , 2000, Parallel Distributed Comput. Pract..

[2]  James C. French,et al.  Proceedings of the 23rd international conference on Scientific and statistical database management , 1994 .

[3]  Norman W. Paton,et al.  Workflow adaptation as an autonomic computing problem , 2007, WORKS '07.

[4]  Marta Mattoso,et al.  Capturing and querying workflow runtime provenance with PROV: a practical approach , 2013, EDBT '13.

[5]  Marta Mattoso,et al.  Handling Failures in Parallel Scientific Workflows Using Clouds , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[6]  Marta Mattoso,et al.  Enabling Re-executions of Parallel Scientific Workflows Using Runtime Provenance Data , 2012, IPAW.

[7]  Daniel S. Katz,et al.  Swift: A language for distributed parallel scripting , 2011, Parallel Comput..

[8]  Anastasia Ailamaki Managing scientific data: lessons, challenges, and opportunities , 2011, SIGMOD '11.

[9]  Ian J. Taylor,et al.  Workflows and e-Science: An overview of workflow system features and capabilities , 2009, Future Gener. Comput. Syst..

[10]  Bartosz Balis,et al.  K-WfGrid Distributed Monitoring and Performance Analysis Services for Workflows in the Grid , 2006, 2006 Second IEEE International Conference on e-Science and Grid Computing (e-Science'06).

[11]  Ewa Deelman,et al.  Failure prediction and localization in large scientific workflows , 2011, WORKS '11.

[12]  Ewa Deelman,et al.  Pegasus: Mapping Large-Scale Workflows to Distributed Resources , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[13]  Ivona Brandic,et al.  Optimizing bioinformatics workflows for data analysis using cloud management techniques , 2011, WORKS '11.

[14]  Marta Mattoso,et al.  UNCERTAINTY QUANTIFICATION IN COMPUTATIONAL PREDICTIVE MODELS FOR FLUID DYNAMICS USING A WORKFLOW MANAGEMENT ENGINE , 2012 .

[15]  Geoffrey C. Fox,et al.  Examining the Challenges of Scientific Workflows , 2007, Computer.

[16]  Cláudio T. Silva,et al.  Provenance for Computational Tasks: A Survey , 2008, Computing in Science & Engineering.

[17]  Marta Mattoso,et al.  SciPhy: A Cloud-Based Workflow for Phylogenetic Analysis of Drug Targets in Protozoan Genomes , 2011, BSB.

[18]  Marta Mattoso,et al.  Supporting dynamic parameter sweep in adaptive and user-steered workflow , 2011, WORKS '11.

[19]  Marta Mattoso,et al.  Optimizing Phylogenetic Analysis Using SciHmm Cloud-based Scientific Workflow , 2011, 2011 IEEE Seventh International Conference on eScience.

[20]  Cláudio T. Silva,et al.  VisTrails: visualization meets data management , 2006, SIGMOD Conference.

[21]  Marta Mattoso,et al.  ProtozoaDB: dynamic visualization and exploration of protozoan genomes , 2007, Nucleic Acids Res..

[22]  Marta Mattoso,et al.  Provenance Services for Distributed Workflows , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).

[23]  Adriana Iamnitchi,et al.  Data transfers in the grid: workload analysis of globus GridFTP , 2008, DADC '08.

[24]  Ian Foster,et al.  The Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition , 1998, The Grid 2, 2nd Edition.

[25]  Kenneth Moreland,et al.  Sandia National Laboratories , 2000 .

[26]  Ian J. Taylor,et al.  A General Approach to Real-Time Workflow Monitoring , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[27]  Marta Mattoso,et al.  Abstract: Using Provenance to Visualize Data from Large-Scale Experiments , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[28]  Marta Mattoso,et al.  Discovering drug targets for neglected diseases using a pharmacophylogenomic cloud workflow , 2012, 2012 IEEE 8th International Conference on E-Science.

[29]  Marta Mattoso,et al.  Exploring Molecular Evolution Reconstruction Using a Parallel Cloud Based Scientific Workflow , 2012, BSB.

[30]  Jeremy S. Meredith,et al.  Parallel in situ coupling of simulation with a fully featured visualization system , 2011, EGPGV '11.

[31]  Hester Bijl,et al.  Uncertainty Quantification in Computational Fluid Dynamics , 2013, Lecture Notes in Computational Science and Engineering.

[32]  Marta Mattoso,et al.  An algebraic approach for data-centric scientific workflows , 2011, Proc. VLDB Endow..

[33]  Chris Rose,et al.  A Break in the Clouds: Towards a Cloud Definition , 2011 .

[34]  Marta Mattoso,et al.  Exploring many task computing in scientific workflows , 2009, MTAGS '09.

[35]  Ray W. Grout,et al.  Ultrascale Visualization In Situ Visualization for Large-Scale Combustion Simulations , 2010 .

[36]  Marta Mattoso,et al.  A Provenance-based Adaptive Scheduling Heuristic for Parallel Scientific Workflows in Clouds , 2012, Journal of Grid Computing.

[37]  Verena Kantere,et al.  Managing scientific data , 2010, Commun. ACM.

[38]  Marta Mattoso,et al.  SciCumulus: A Lightweight Cloud Middleware to Explore Many Task Computing Paradigm in Scientific Workflows , 2010, 2010 IEEE 3rd International Conference on Cloud Computing.

[39]  Anne H. H. Ngu,et al.  Enabling ScientificWorkflow Reuse through Structured Composition of Dataflow and Control-Flow , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[40]  Carole A. Goble,et al.  Taverna, Reloaded , 2010, SSDBM.