BackgroundComputational methods for problem solving need to interleave information access and algorithm execution in a problem-specific workflow. The structures of these workflows are defined by a scaffold of syntactic, semantic and algebraic objects capable of representing them. Despite the proliferation of GUIs (Graphic User Interfaces) in bioinformatics, only some of them provide workflow capabilities; surprisingly, no meta-analysis of workflow operators and components in bioinformatics has been reported.ResultsWe present a set of syntactic components and algebraic operators capable of representing analytical workflows in bioinformatics. Iteration, recursion, the use of conditional statements, and management of suspend/resume tasks have traditionally been implemented on an ad hoc basis and hard-coded; by having these operators properly defined it is possible to use and parameterize them as generic re-usable components. To illustrate how these operations can be orchestrated, we present GPIPE, a prototype graphic pipeline generator for PISE that allows the definition of a pipeline, parameterization of its component methods, and storage of metadata in XML formats. This implementation goes beyond the macro capacities currently in PISE. As the entire analysis protocol is defined in XML, a complete bioinformatic experiment (linked sets of methods, parameters and results) can be reproduced or shared among users. Availability:http://if-web1.imb.uq.edu.au/Pise/5.a/gpipe.html (interactive), ftp://ftp.pasteur.fr/pub/GenSoft/unix/misc/Pise/ (download).ConclusionFrom our meta-analysis we have identified syntactic structures and algebraic operators common to many workflows in bioinformatics. The workflow components and algebraic operators can be assimilated into re-usable software components. GPIPE, a prototype implementation of this framework, provides a GUI builder to facilitate the generation of workflows and integration of heterogeneous analytical tools.
[1]
Carole A. Goble,et al.
myGrid: personalised bioinformatics on the information grid
,
2003,
ISMB.
[2]
John F. Sowa,et al.
Top-level ontological categories
,
1995,
Int. J. Hum. Comput. Stud..
[3]
D. Hollingsworth.
The workflow Reference Model
,
1994
.
[4]
Bernhard Ganter,et al.
Conceptual Structures: Logical, Linguistic, and Computational Issues
,
2000,
Lecture Notes in Computer Science.
[5]
Carole A. Goble,et al.
A classification of tasks in bioinformatics
,
2001,
Bioinform..
[6]
Trevor P Martin,et al.
Lecture Notes in Computer Science 1867
,
2000
.
[7]
I. Longden,et al.
EMBOSS: the European Molecular Biology Open Software Suite.
,
2000,
Trends in genetics : TIG.
[8]
Peter Ernst,et al.
A task framework for the web interface W2H
,
2003,
Bioinform..
[9]
Peter Ernst,et al.
W2H: WWW interface to the GCG sequence analysis package
,
1998,
Bioinform..
[10]
Tao Xu,et al.
Pegasys: software for executing and integrating analyses of biological sequences
,
2004,
BMC Bioinformatics.
[11]
N. Cohen,et al.
Los Angeles 1997
,
1998
.
[12]
Catherine Letondal,et al.
A Web interface generator for molecular biology programs in Unix
,
2001,
Bioinform..