Automatic Annotation of Bioinformatics Workflows with Biomedical Ontologies

Legacy scientific workflows, and the services within them, often present scarce and unstructured (i.e. textual) descriptions. This makes it difficult to find, share and reuse them, thus dramatically reducing their value to the community. This paper presents an approach to annotating workflows and their subcomponents with ontology terms, in an attempt to describe these artifacts in a structured way. Despite a dearth of even textual descriptions, we automatically annotated 530 myExperiment bioinformatics-related workflows, including more than 2600 workflow-associated services, with relevant ontological terms. Quantitative evaluation of the Information Content of these terms suggests that, in cases where annotation was possible at all, the annotation quality was comparable to manually curated bioinformatics resources.

[1]  Yolanda Gil,et al.  A new approach for publishing workflows: abstractions, standards, and linked data , 2011, WORKS '11.

[2]  Alfonso Valencia,et al.  Interoperability with Moby 1.0--it's better than sharing your toothbrush! , 2008, Briefings in bioinformatics.

[3]  Martijn J. Schuemie,et al.  Peregrine: Lightweight gene name normalization by dictionary lookup , 2007 .

[4]  Cláudio T. Silva,et al.  VisTrails: visualization meets data management , 2006, SIGMOD Conference.

[5]  Matthew R. Pocock,et al.  Taverna: a tool for the composition and enactment of bioinformatics workflows , 2004, Bioinform..

[6]  Carole A. Goble,et al.  Applying Semantic Web Services to Bioinformatics: Experiences Gained, Lessons Learnt , 2004, SEMWEB.

[7]  Tanya Z. Berardini,et al.  The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools , 2011, Nucleic Acids Res..

[8]  Carole A. Goble,et al.  Common motifs in scientific workflows: An empirical analysis , 2012, 2012 IEEE 8th International Conference on E-Science.

[9]  Carole A. Goble,et al.  myExperiment: a repository and social network for the sharing of bioinformatics workflows , 2010, Nucleic Acids Res..

[10]  Jon C. Ison,et al.  EMBRACE: Bioinformatics Data and Analysis Tool Services for e-Science , 2006, 2006 Second IEEE International Conference on e-Science and Grid Computing (e-Science'06).

[11]  Daniel L. Rubin,et al.  Comparison of concept recognizers for building the Open Biomedical Annotator , 2009, BMC Bioinformatics.

[12]  Benjamin M. Good,et al.  Strategies for amassing, characterizing, and applying third-party metadata in bioinformatics , 2009 .

[13]  Mark D. Wilkinson,et al.  Semantically-Guided Workflow Construction in Taverna: The SADI and BioMoby Plug-Ins , 2010, ISoLA.

[14]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[15]  Steve Pettifer,et al.  EDAM: an ontology of bioinformatics operations, types of data and identifiers, topics and formats , 2013, Bioinform..

[16]  Carole A. Goble,et al.  The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud , 2013, Nucleic Acids Res..

[17]  David Sánchez,et al.  Ontology-based information content computation , 2011, Knowl. Based Syst..

[18]  Carole A. Goble,et al.  BioCatalogue: a universal catalogue of web services for the life sciences , 2010, Nucleic Acids Res..

[19]  G. Omenn,et al.  Evolution of Translational Omics: Lessons Learned and the Path Forward , 2013 .

[20]  Ulf Leser,et al.  Adapters, shims, and glue - service interoperability for in silico experiments , 2006, Bioinform..

[21]  Paul T. Groth,et al.  Wings: Intelligent Workflow-Based Design of Computational Experiments , 2011, IEEE Intelligent Systems.

[22]  M. Ashburner,et al.  The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration , 2007, Nature Biotechnology.

[23]  Sylvie Ranwez,et al.  The semantic measures library and toolkit: fast computation of semantic similarity and relatedness using biomedical ontologies , 2014, Bioinform..

[24]  Mark D. Wilkinson,et al.  The Semantic Automated Discovery and Integration (SADI) Web service Design-Pattern, API and Reference Implementation , 2011 .

[25]  Csongor Nyulas,et al.  BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications , 2011, Nucleic Acids Res..

[26]  Tony Veale,et al.  An Intrinsic Information Content Metric for Semantic Similarity in WordNet , 2004, ECAI.

[27]  Bertram Ludäscher,et al.  Kepler: an extensible system for design and execution of scientific workflows , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[28]  OEG-DIA Towards Open Publication of Reusable Scientific Workflows : Abstractions , Standards and Linked Data , 2012 .

[29]  Mark D. Wilkinson,et al.  SHARE: A Semantic Web Query Engine for Bioinformatics , 2009, ASWC.

[30]  Junzhong Gu,et al.  A New Model of Information Content for Semantic Similarity in WordNet , 2008, 2008 Second International Conference on Future Generation Communication and Networking Symposia.