APE: A Command-Line Tool and API for Automated Workflow Composition

Automated workflow composition is bound to take the work with scientific workflows to the next level. On top of today’s comprehensive eScience infrastructure, it enables the automated generation of possible workflows for a given specification. However, functionality for automated workflow composition tends to be integrated with one of the many available workflow management systems, and is thus difficult or impossible to apply in other environments. Therefore we have developed APE (the Automated Pipeline Explorer) as a command-line tool and API for automated composition of scientific workflows. APE is easily configured to a new application domain by providing it with a domain ontology and semantically annotated tools. It can then be used to synthesize purpose-specific workflows based on a specification of the available workflow inputs, desired outputs and possibly additional constraints. The workflows can further be transformed into executable implementations and/or exported into standard workflow formats. In this paper we describe APE v1.0 and discuss lessons learned from applications in bioinformatics and geosciences.

[1]  Anna-Lena Lamprecht,et al.  Correction to: Workflow Discovery Through Semantic Constraints: A Geovisualization Case Study , 2019 .

[2]  Johan Montagnat,et al.  Scientific workflows: Past, present and future , 2017, Future Gener. Comput. Syst..

[3]  Mary Goldman,et al.  Toil enables reproducible, open source, big biomedical data analyses , 2017, Nature Biotechnology.

[4]  Steve Pettifer,et al.  EDAM: an ontology of bioinformatics operations, types of data and identifiers, topics and formats , 2013, Bioinform..

[5]  Sven Rahmann,et al.  Genome analysis , 2022 .

[6]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[7]  Tiziana Margaria,et al.  Module Connguration by Minimal Model Construction , 1993 .

[8]  Thorsten Meinl,et al.  KNIME - the Konstanz information miner: version 2.0 and beyond , 2009, SKDD.

[9]  Anna-Lena Lamprecht User-Level Workflow Design - A Bioinformatics Perspective , 2013, Lecture Notes in Computer Science.

[10]  Anna-Lena Lamprecht,et al.  Community curation of bioinformatics software and data resources , 2019, Briefings Bioinform..

[11]  Anna-Lena Lamprecht,et al.  Automated workflow composition in mass spectrometry-based proteomics , 2018, Bioinform..

[12]  Paul T. Groth,et al.  Wings: Intelligent Workflow-Based Design of Computational Experiments , 2011, IEEE Intelligent Systems.

[13]  Silvio C. E. Tosatto,et al.  Tools and data services registry: a community effort to document bioinformatics resources , 2015, Nucleic Acids Res..

[14]  Anna-Lena Lamprecht,et al.  Workflow Discovery Through Semantic Constraints: A Geovisualization Case Study , 2019, ICCSA.

[15]  Simon Scheider,et al.  Ontology of core concept data types for answering geo-analytical questions , 2020, J. Spatial Inf. Sci..

[16]  Paolo Di Tommaso,et al.  Nextflow enables reproducible computational workflows , 2017, Nature Biotechnology.

[17]  Oswaldo Trelles,et al.  Workflow Composition and Enactment Using jORCA , 2010, ISoLA.

[18]  Simon Scheider,et al.  Loose programming of GIS workflows with geo‐analytical concepts , 2020, Trans. GIS.

[19]  Tiziana Margaria,et al.  Constraint-Guided Workflow Composition Based on the EDAM Ontology , 2010, SWAT4LS.

[20]  Bernhard Steffen,et al.  Loose Programming with PROPHETS , 2012, FASE.

[21]  Anna-Lena Lamprecht,et al.  User-Level Workflow Design , 2013, Lecture Notes in Computer Science.

[22]  Tiziana Margaria,et al.  Synthesis-Based Loose Programming , 2010, 2010 Seventh International Conference on the Quality of Information and Communications Technology.

[23]  Tiziana Margaria,et al.  Semantics-based composition of EMBOSS services , 2011, J. Biomed. Semant..