Data curation + process curation=data integration + science

In bioinformatics, we are familiar with the idea of curated data as a prerequisite for data integration. We neglect, often to our cost, the curation and cataloguing of the processes that we use to integrate and analyse our data. Programmatic access to services, for data and processes, means that compositions of services can be made that represent the in silico experiments or processes that bioinformaticians perform. Data integration through workflows depends on being able to know what services exist and where to find those services. The large number of services and the operations they perform, their arbitrary naming and lack of documentation, however, mean that they can be difficult to use. The workflows themselves are composite processes that could be pooled and reused but only if they too can be found and understood. Thus appropriate curation, including semantic mark-up, would enable processes to be found, maintained and consequently used more easily. This broader view on semantic annotation is vital for full data integration that is necessary for the modern scientific analyses in biology. This article will brief the community on the current state of the art and the current challenges for process curation, both within and without the Life Sciences.

[1]  Jun Zhang,et al.  Simlarity Search for Web Services , 2004, VLDB.

[2]  Carole A. Goble,et al.  Automatic Annotation of Web Services Based on Workflow Definitions , 2006, International Semantic Web Conference.

[3]  L. Stein Creating a bioinformatics nation , 2002, Nature.

[4]  David J. Reiss,et al.  The Gaggle: An open-source software system for integrating bioinformatics software and data sources , 2006, BMC Bioinformatics.

[5]  Scott McMillan,et al.  The Bioinformatics Links Directory: a Compilation of Molecular Biology Web Servers , 2005, Nucleic Acids Res..

[6]  Fumikazu Konishi,et al.  MOLWORKS+G: INTEGRATED PLATFORM FOR THE ACCELERATION OF MOLECULAR DESIGN BY GRID COMPUTING , 2006 .

[7]  Bart De Moor,et al.  BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis , 2005, Bioinform..

[8]  Heiko Schoof,et al.  BioMOBY Successfully Integrates Distributed Heterogeneous Bioinformatics Web Services. The PlaNet Exemplar Case1 , 2005, Plant Physiology.

[9]  Páll Ísólfur Ólason,et al.  Integrating protein annotation resources through the Distributed Annotation System , 2005, Nucleic Acids Res..

[10]  David Groenewegen,et al.  The Data Curation Continuum: Managing Data Objects in Institutional Repositories , 2007, D Lib Mag..

[11]  I. Melzer Web Services Description Language , 2010 .

[12]  Jos de Bruijn,et al.  Web Service Modeling Ontology , 2005, Appl. Ontology.

[13]  Luciano Milanesi,et al.  Web services and workflow management for biological resources , 2005, BMC Bioinformatics.

[14]  Sanjiva Weerawarana,et al.  Unraveling the Web services web: an introduction to SOAP, WSDL, and UDDI , 2002, IEEE Internet Computing.

[15]  Robert Stevens,et al.  Knowledge Discovery for Biology with Taverna , 2006 .

[16]  Jerry R. Hobbs,et al.  DAML-S: Semantic Markup for Web Services , 2001, SWWS.

[17]  Rolf Apweiler,et al.  InterProScan: protein domains identifier , 2005, Nucleic Acids Res..

[18]  Christopher J. Rawlings,et al.  Graph-based analysis and visualization of experimental results with ONDEX , 2006, Bioinform..

[19]  Mor Naaman,et al.  Why do tagging systems work? , 2006, CHI Extended Abstracts.

[20]  Louis O. Hertzberger e-Science and the VL-e Approach , 2006, Trans. Comp. Sys. Biology.

[21]  Dennis Gannon,et al.  Workflows for e-Science, Scientific Workflows for Grids , 2014 .

[22]  Kiyoko F. Aoki-Kinoshita,et al.  From genomics to chemical genomics: new developments in KEGG , 2005, Nucleic Acids Res..

[23]  S. R. Pettifer,et al.  UTOPIA—User-Friendly Tools for Operating Informatics Applications , 2004, Comparative and functional genomics.

[24]  S. Lewis,et al.  The generic genome browser: a building block for a model organism system database. , 2002, Genome research.

[25]  David Charles De Roure,et al.  myExperiment: social networking for workflow-using e-scientists , 2007, WORKS '07.

[26]  Ning Zhang,et al.  A Linkable Identity Privacy Algorithm for HealthGrid , 2005, HealthGrid.

[27]  Sean R. Eddy,et al.  The Distributed Annotation System , 2001, BMC Bioinformatics.

[28]  J. Scott Brockenbrough,et al.  Computational Genomics: Theory and Application , 2005 .

[29]  Carole A. Goble,et al.  The myGrid ontology: bioinformatics service discovery , 2007, Int. J. Bioinform. Res. Appl..

[30]  Carole A. Goble,et al.  Applying Semantic Web Services to Bioinformatics: Experiences Gained, Lessons Learnt , 2004, SEMWEB.

[31]  Tao Xu,et al.  Atlas – a data warehouse for integrative bioinformatics , 2005, BMC Bioinformatics.

[32]  Carole A. Goble,et al.  Seven Bottlenecks to Workflow Reuse and Repurposing , 2005, International Semantic Web Conference.

[33]  Carole A. Goble,et al.  A Suite of Daml+Oil Ontologies to Describe Bioinformatics Web Services and Data , 2003, Int. J. Cooperative Inf. Syst..

[34]  Olivier Bodenreider,et al.  Bio-ontologies: current trends and future directions , 2006, Briefings Bioinform..

[35]  Thomas Haselwanter Finding Web Services , 2007 .

[36]  Brigitte Mathiak,et al.  A database ontology for signal transduction pathways , 2007, Int. J. Bioinform. Res. Appl..

[37]  Kieren Diment,et al.  Authentication and Authorization , 2009 .

[38]  Robert Stevens,et al.  Treating Shimantic Web Syndrome with Ontologies , 2004 .

[39]  Michael Y. Galperin The Molecular Biology Database Collection: 2008 update , 2007, Nucleic Acids Res..

[40]  Giancarlo Mauri,et al.  Biowep: a workflow enactment portal for bioinformatics applications , 2007, BMC Bioinformatics.

[41]  Kei-Hoi Cheung,et al.  Semantic Web: Revolutionizing Knowledge Discovery in the Life Sciences , 2006 .

[42]  Carole A. Goble,et al.  State of the nation in data integration for bioinformatics , 2008, J. Biomed. Informatics.

[43]  Carole A. Goble,et al.  Exploring Williams-Beuren syndrome using myGrid , 2004, ISMB/ECCB.

[44]  Mihail Konstantinov,et al.  WSMO Studio - A Semantic Web Services Modelling Environment for WSMO , 2007, ESWC.

[45]  Francine Berman,et al.  The encyclopedia of life project: Grid software and deployment , 2009, New Generation Computing.

[46]  Bruno W. S. Sobral,et al.  A Life Scientist's Gateway to Distributed Data Management and Computing: The PathPort/ToolBus Framework , 2003, OMICS.

[47]  Carole A. Goble,et al.  A classification of tasks in bioinformatics , 2001, Bioinform..

[48]  Carole A. Goble,et al.  Taverna: a tool for building and running workflows of services , 2006, Nucleic Acids Res..

[49]  Nigel W. Hardy,et al.  Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project , 2008, Nature Biotechnology.

[50]  Tomas Vitvar,et al.  SAWSDL: Semantic Annotations for WSDL and XML Schema , 2007, IEEE Internet Computing.

[51]  Carole A. Goble,et al.  Feta: A Light-Weight Architecture for User Oriented Semantic Service Discovery , 2005, ESWC.

[52]  Eyhab Al-Masri,et al.  Investigating web services on the world wide web , 2008, WWW.

[53]  Robert D. Finn,et al.  The Distributed Annotation System for Integration of Biological Data , 2006, DILS.