Automation of in-silico data analysis processes through workflow management systems

Data integration is needed in order to cope with the huge amounts of biological information now available and to perform data mining effectively. Current data integration systems have strict limitations, mainly due to the number of resources, their size and frequency of updates, their heterogeneity and distribution on the Internet. Integration must therefore be achieved by accessing network services through flexible and extensible data integration and analysis network tools. EXtensible Markup Language (XML), Web Services and Workflow Management Systems (WMS) can support the creation and deployment of such systems. Many XML languages and Web Services for bioinformatics have already been designed and implemented and some WMS have been proposed. In this article, we review a methodology for data integration in biomedical research that is based on these technologies. We also briefly describe some of the available WMS and discuss the current limitations of this methodology and the ways in which they can be overcome.

[1]  Carole A. Goble,et al.  A Suite of Daml+Oil Ontologies to Describe Bioinformatics Web Services and Data , 2003, Int. J. Cooperative Inf. Syst..

[2]  Arun Krishnan,et al.  Wildfire: distributed, Grid-enabled workflow construction and execution , 2004, BMC Bioinformatics.

[3]  ariadne staff,et al.  The Information Grid , 2002 .

[4]  Luciano Milanesi,et al.  Web services and workflow management for biological resources , 2005, BMC Bioinformatics.

[5]  Tim J. Carver,et al.  The design of Jemboss: a graphical user interface to EMBOSS , 2003, Bioinform..

[6]  Michael Y. Galperin The Molecular Biology Database Collection: 2007 update , 2006, Nucleic Acids Res..

[7]  L. Stein Creating a bioinformatics nation , 2002, Nature.

[8]  Jan Krüger,et al.  Playing with pesticides. , 1998, BMC Bioinformatics.

[9]  Emmanuel Barillot,et al.  XML, bioinformatics and data integration , 2001, Bioinform..

[10]  Jason E. Stewart,et al.  Design and implementation of microarray gene expression markup language (MAGE-ML) , 2002, Genome Biology.

[11]  D. Curtis Jamison,et al.  Open Bioinformatics , 2003, Bioinform..

[12]  Carole A. Goble,et al.  myGrid: personalised bioinformatics on the information grid , 2003, ISMB.

[13]  P. Argos,et al.  SRS: information retrieval system for molecular biology data banks. , 1996, Methods in enzymology.

[14]  Thierry Soussi,et al.  The UMD‐p53 database: New mutations and analysis tools , 2003, Human mutation.

[15]  Sean R. Eddy,et al.  The Distributed Annotation System , 2001, BMC Bioinformatics.

[16]  Jason Maassen,et al.  Programming Scientific and Distributed Workflow with Triana Services , 2004 .

[17]  angesichts der Corona-Pandemie,et al.  UPDATE , 1973, The Lancet.

[18]  Carole A. Goble,et al.  Feta: A Light-Weight Architecture for User Oriented Semantic Service Discovery , 2005, ESWC.

[19]  Sergio Contrino,et al.  ArrayExpress—a public repository for microarray gene expression data at the EBI , 2004, Nucleic Acids Res..

[20]  Rodrigo Lopez,et al.  Web Services at the European Bioinformatics Institute , 2007, Nucleic Acids Res..

[21]  Ian J. Taylor,et al.  Distributed computing with Triana on the Grid , 2005, Concurr. Pract. Exp..

[22]  Ivan Janciak,et al.  UK e-Science All Hands Meeting , 2009 .

[23]  Giancarlo Mauri,et al.  Biowep: a workflow enactment portal for bioinformatics applications , 2007, BMC Bioinformatics.

[24]  C. Harris,et al.  The IARC TP53 database: New online mutation analysis and recommendations to users , 2002, Human mutation.

[25]  Jack A. M. Leunissen,et al.  Evolution of web services in bioinformatics , 2005, Briefings Bioinform..

[26]  Heiko Schoof,et al.  BioMOBY Successfully Integrates Distributed Heterogeneous Bioinformatics Web Services. The PlaNet Exemplar Case1 , 2005, Plant Physiology.

[27]  Bertram Ludäscher,et al.  Kepler: an extensible system for design and execution of scientific workflows , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[28]  José Francisco Aldana Montes,et al.  Intelligent client for integrating bioinformatics services , 2006, Bioinform..

[29]  Russ B. Altman,et al.  Time to Organize the Bioinformatics Resourceome , 2005, PLoS Comput. Biol..

[30]  Tao Xu,et al.  Pegasys: software for executing and integrating analyses of biological sequences , 2004, BMC Bioinformatics.

[31]  Daniel S. Katz,et al.  Pegasus: A framework for mapping complex scientific workflows onto distributed systems , 2005, Sci. Program..

[32]  Matthew R. Pocock,et al.  Taverna: a tool for the composition and enactment of bioinformatics workflows , 2004, Bioinform..

[33]  Christoph W. Sensen,et al.  Semantic Web Service provision: a realistic framework for Bioinformatics programmers , 2007, Bioinform..

[34]  Shawn Hoon,et al.  Biopipe: a flexible framework for protocol-based bioinformatics analysis. , 2003, Genome research.

[35]  T. Oinn,et al.  Soaplab - a unified Sesame door to analysis tools , 2003 .

[36]  Mark D. Wilkinson,et al.  BioMOBY: An Open Source Biological Web Services Proposal , 2002, Briefings Bioinform..

[37]  T. Matise,et al.  Nucleotide Sequence Database Policies , 2002, Science.

[38]  Ezio Bartocci,et al.  BioWMS: a web-based Workflow Management System for bioinformatics , 2007, BMC Bioinformatics.

[39]  Martin Senger,et al.  BioMoby extensions to the Taverna workflow management and enactment software , 2006, BMC Bioinformatics.

[40]  Hiroaki Kitano,et al.  The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models , 2003, Bioinform..

[41]  Paolo Romano,et al.  The Role of Informatics in the Coordinated Management of Biological Resources Collections , 2005, Applied bioinformatics.

[42]  Omran A. Bukhres,et al.  SIBIOS: a system for the integration of bioinformatics services , 2004, Proceedings of the Second International Workshop on Challenges of Large Applications in Distributed Environments, 2004. CLADE 2004..

[43]  Kei-Hoi Cheung,et al.  Biosphere: the interoperation of web services in microarray cluster analysis. , 2004, Applied bioinformatics.

[44]  Martin J. Bishop,et al.  Editorial , 2007, Briefings Bioinform..

[45]  Ulf Leser,et al.  Adapters, shims, and glue - service interoperability for in silico experiments , 2006, Bioinform..

[46]  Jill P. Mesirov,et al.  GeneCruiser: a web service for the annotation of microarray data , 2005, Bioinform..

[47]  Carole A. Goble,et al.  Taverna: a tool for building and running workflows of services , 2006, Nucleic Acids Res..

[48]  Greg D. Tyrelle,et al.  A Platform for the Description, Distribution and Analysis of Genetic Polymorphism Data , 2003, APBC.

[49]  ChurchesDavid,et al.  Programming scientific and distributed workflow with Triana services , 2006 .

[50]  Jason E. Stewart,et al.  Minimum information about a microarray experiment (MIAME)—toward standards for microarray data , 2001, Nature Genetics.

[51]  Malika Mahoui,et al.  SIBIOS Ontology: A Robust Package for the Integration and Pipelining of Bioinformatics Services , 2006, DILS.

[52]  M. Stratton,et al.  The COSMIC (Catalogue of Somatic Mutations in Cancer) database and website , 2004, British Journal of Cancer.