Using provenance to manage knowledge of In Silico experiments

This article offers a briefing in one of the knowledge management issues of in silico experimentation in bioinformatics. Recording of the provenance of an experiment-what was done; where, how and why, etc. is an important aspect of scientific best practice that should be extended to in silico experimentation. We will do this in the context of eScience which has been part of the move of bioinformatics towards an industrial setting. Despite the computational nature of bioinformatics, these analyses are scientific and thus necessitate their own versions of typical scientific rigour. Just as recording who, what, why, when, where and how of an experiment is central to the scientific process in laboratory science, so it should be in silico science. The generation and recording of these aspects, or provenance, of an experiment are necessary knowledge management goals if we are to introduce scientific rigour into routine bioinformatics. In Silico experimental protocols should themselves be a form of managing the knowledge of how to perform bioinformatics analyses. Several systems now exist that offer support for the generation and collection of provenance information about how a particular in silico experiment was run, what results were generated, how they were generated, etc. In reviewing provenance support, we will review one of the important knowledge management issues in bioinformatics.

[1]  Robert Stevens,et al.  Knowledge Discovery for Biology with Taverna , 2006 .

[2]  Peter Buneman,et al.  Challenges in Integrating Biological Data Sources , 1995, J. Comput. Biol..

[3]  Ian Horrocks,et al.  Deciding Semantic Matching of Stateless Services , 2006, AAAI.

[4]  Edward A. Lee,et al.  CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2000; 00:1–7 Prepared using cpeauth.cls [Version: 2002/09/19 v2.02] Taverna: Lessons in creating , 2022 .

[5]  Carole A. Goble,et al.  Guest editors' introduction to the special section on scientific workflows , 2005, SGMD.

[6]  L. Stein Creating a bioinformatics nation , 2002, Nature.

[7]  Kei-Hoi Cheung,et al.  Semantic Web: Revolutionizing Knowledge Discovery in the Life Sciences , 2006 .

[8]  Steffen Staab,et al.  Handbook on Ontologies in Information Systems , 2003 .

[9]  Carole A. Goble,et al.  A classification of tasks in bioinformatics , 2001, Bioinform..

[10]  Gustavo Alonso,et al.  Web Services: Concepts, Architectures and Applications , 2009 .

[11]  Yong Zhao,et al.  Applying the Virtual Data Provenance Model , 2006, IPAW.

[12]  Paul T. Groth,et al.  The provenance of electronic data , 2008, CACM.

[13]  Carole A. Goble,et al.  An Identity Crisis in the Life Sciences , 2006, IPAW.

[14]  Carole A. Goble,et al.  The myGrid ontology: bioinformatics service discovery , 2007, Int. J. Bioinform. Res. Appl..

[15]  Sean Martin,et al.  Globally distributed object identification for biological knowledgebases , 2004, Briefings Bioinform..

[16]  Francine Berman,et al.  Grid Computing: Making the Global Infrastructure a Reality , 2003 .

[17]  Simon Miles Electronically Querying for the Provenance of Entities , 2006, IPAW.

[18]  James A. Hendler,et al.  Annotation and Provenance Tracking in Semantic Web Photo Libraries , 2006, IPAW.

[19]  Peter D. Karp,et al.  A Strategy for Database Interoperation , 1995, J. Comput. Biol..

[20]  Carole A. Goble,et al.  An ontology for bioinformatics applications , 1999, Bioinform..

[21]  Peter Bajcsy,et al.  A Meta-Workflow Cyberinfrastructure System Designed for Environmental Observatories , 2005 .

[22]  Jennifer Golbeck,et al.  Combining Provenance with Trust in Social Networks for Semantic Web Content Filtering , 2006, IPAW.

[23]  Carole A. Goble,et al.  Workflow discovery: the problem, a case study from e-Science and a graph-based solution , 2006, 2006 IEEE International Conference on Web Services (ICWS'06).

[24]  James Hendler,et al.  Science and the Semantic Web , 2003, Science.

[25]  Carole A. Goble,et al.  Exploring Williams-Beuren syndrome using myGrid , 2004, ISMB/ECCB.

[26]  Paul T. Groth,et al.  Provenance-based validation of e-science experiments , 2005, J. Web Semant..

[27]  Yolanda Gil,et al.  Semantic Metadata Generation for Large Scientific Workflows , 2006, SEMWEB.

[28]  Susanna-Assunta Sansone,et al.  A Special Issue on Data Standards , 2006 .

[29]  Ilkay Altintas,et al.  Provenance Collection Support in the Kepler Scientific Workflow System , 2006, IPAW.

[30]  Carole A. Goble,et al.  Ontologies in Bioinformatics , 2004, Handbook on Ontologies.

[31]  Carole A. Goble,et al.  Using Semantic Web Technologies for Representing E-science Provenance , 2004, SEMWEB.

[32]  Jonathan W. Essex,et al.  CombeChem: A Case Study in Provenance and Annotation Using the Semantic Web , 2006, IPAW.

[33]  Emmanuel Barillot,et al.  DBcat: a catalog of 500 biological databases , 2000, Nucleic Acids Res..

[34]  Carole A. Goble,et al.  Ontology-based Knowledge Representation for Bioinformatics , 2000, Briefings Bioinform..