XML-based computation for scientific workflows

Scientific workflows are increasingly used to rapidly integrate existing algorithms to create larger and more complex programs. However, designing workflows using purely dataflow-oriented computation models introduces a number of challenges, including the need to use low-level components to mediate and transform data (so-called shims) and large numbers of additional ¿wires¿ for routing data to components within a workflow. To address these problems, we employ Virtual Data Assembly Lines (VDAL), a modeling paradigm that can eliminate most shims and reduce wiring complexity. We show how a VDAL design can be implemented using existing XML technologies and how static analysis can provide significant help to scientists during workflow design and evolution, e.g., by displaying actor dependencies or by detecting so-called unproductive actors.

[1]  Gilles Kahn,et al.  The Semantics of a Simple Language for Parallel Programming , 1974, IFIP Congress.

[2]  David J. DeWitt,et al.  NiagaraCQ: a scalable continuous query system for Internet databases , 2000, SIGMOD '00.

[3]  Stefanie Scherzinger,et al.  FluXQuery: An Optimizing XQuery Processor for Streaming XML Data , 2004, VLDB.

[4]  Bertram Ludäscher,et al.  Collection-Oriented Scientific Workflows for Integrating and Analyzing Biological Data , 2006, DILS.

[5]  Robert Stevens,et al.  Treating Shimantic Web Syndrome with Ontologies , 2004 .

[6]  James Cheney,et al.  FLUX: functional updates for XML , 2008, ICFP.

[7]  Dan Suciu,et al.  Processing XML streams with deterministic automata and stream indexes , 2004, TODS.

[8]  Edward A. Lee,et al.  Dataflow process networks , 1995, Proc. IEEE.

[9]  Edward A. Lee,et al.  CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2000; 00:1–7 Prepared using cpeauth.cls [Version: 2002/09/19 v2.02] Taverna: Lessons in creating , 2022 .

[10]  Susan B. Davidson,et al.  An Efficient XPath Query Processor for XML Streams , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[11]  Yong Zhao,et al.  Chimera: a virtual data system for representing, querying, and automating data derivation , 2002, Proceedings 14th International Conference on Scientific and Statistical Database Management.

[12]  Giuseppe Castagna,et al.  CDuce: an XML-centric general-purpose language , 2003, ACM SIGPLAN Notices.

[13]  Michael Stonebraker,et al.  Fault-tolerance in the Borealis distributed stream processing system , 2005, SIGMOD '05.

[14]  Stefanie Scherzinger,et al.  Schema-based Scheduling of Event Processors and Buffer Minimization for Queries on Structured Data Streams , 2004, VLDB.

[15]  Bertram Ludäscher,et al.  Scientific workflow design with data assembly lines , 2009, WORKS '09.

[16]  Anne H. H. Ngu,et al.  Enabling ScientificWorkflow Reuse through Structured Composition of Dataflow and Control-Flow , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[17]  Paolo Manghi,et al.  Static analysis for path correctness of XML queries , 2006, J. Funct. Program..

[18]  What Are All Those Funny Symbols in a Blast Printout? Blast = Basic Local Alignment Search Tool , 2022 .

[19]  Bertram Ludäscher,et al.  Kepler: an extensible system for design and execution of scientific workflows , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[20]  Shawn Bowers,et al.  An approach for pipelining nested collections in scientific workflows , 2005, SGMD.

[21]  Frederick Reiss,et al.  TelegraphCQ: Continuous Dataflow Processing for an Uncertain World , 2003, CIDR.

[22]  Cláudio T. Silva,et al.  Managing Rapidly-Evolving Scientific Workflows , 2006, IPAW.

[23]  Thomas Ludwig,et al.  RAxML-OMP: An Efficient Program for Phylogenetic Inference on SMPs , 2005, PaCT.

[24]  Bertram Ludäscher,et al.  X-CSR: Dataflow Optimization for Distributed XML Process Pipelines , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[25]  Ulf Leser,et al.  Adapters, shims, and glue - service interoperability for in silico experiments , 2006, Bioinform..

[26]  Bertram Ludäscher,et al.  Parallelizing XML Processing Pipelines via MapReduce , 2009 .

[27]  Paolo Manghi,et al.  Types for path correctness of XML queries , 2004, ICFP '04.

[28]  Bertram Ludäscher,et al.  Actor-Oriented Design of Scientific Workflows , 2005, ER.

[29]  Bertram Ludäscher,et al.  An Ontology-Driven Framework for Data Transformation in Scientific Workflows , 2004, DILS.

[30]  Carole A. Goble,et al.  Guest editors' introduction to the special section on scientific workflows , 2005, SGMD.

[31]  Benjamin C. Pierce,et al.  Regular expression types for XML , 2000, TOPL.

[32]  Gilles Kahn,et al.  Coroutines and Networks of Parallel Processes , 1977, IFIP Congress.

[33]  Jacek Sroka,et al.  Petri Net + Nested Relational Calculus = Dataflow , 2005, OTM Conferences.

[34]  Bertram Ludäscher,et al.  Scientific workflow design for mere mortals , 2009, Future Gener. Comput. Syst..

[35]  Scott Klasky,et al.  Workflow automation for processing plasma fusion simulation data , 2007, WORKS '07.