Provenance in bioinformatics workflows

In this work, we used the PROV-DM model to manage data provenance in workflows of genome projects. This provenance model allows the storage of details of one workflow execution, e.g., raw and produced data and computational tools, their versions and parameters. Using this model, biologists can access details of one particular execution of a workflow, compare results produced by different executions, and plan new experiments more efficiently. In addition to this, a provenance simulator was created, which facilitates the inclusion of provenance data of one genome project workflow execution. Finally, we discuss one case study, which aims to identify genes involved in specific metabolic pathways of Bacillus cereus, as well as to compare this isolate with other phylogenetic related bacteria from the Bacillus group. B. cereus is an extremophilic bacteria, collected in warm water in the Midwestern Region of Brazil, its DNA samples having been sequenced with an NGS machine.

[1]  Olaf Hartig,et al.  Using Web Data Provenance for Quality Assessment , 2009, SWPM.

[2]  Margo I. Seltzer,et al.  Towards Query Interoperability: PASSing PLUS , 2010, TaPP.

[3]  Michael Jünger,et al.  Graph Drawing Software , 2003, Graph Drawing Software.

[4]  Olaf Hartig,et al.  Publishing and Consuming Provenance Metadata on the Web of Linked Data , 2010, IPAW.

[5]  Chris Elsaesser Provenance-based Belief , 2010, TaPP.

[6]  Tomi Kauppinen,et al.  Tracking Editing Processes in Volunteered Geographic Information: The Case of OpenStreetMap , 2011, COSIT 2011.

[7]  Maristela Holanda,et al.  Managing data provenance in genome project workflows , 2012, 2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops.

[8]  Yogesh L. Simmhan,et al.  The Open Provenance Model core specification (v1.1) , 2011, Future Gener. Comput. Syst..

[9]  Amit P. Sheth,et al.  Provenance Aware Linked Sensor Data , 2010 .

[10]  Luc Moreau,et al.  Provenance-Based Auditing of Private Data Use , 2008, BCS Int. Acad. Conf..

[11]  Sanjeev Khanna,et al.  Why and Where: A Characterization of Data Provenance , 2001, ICDT.

[12]  Emden R. Gansner,et al.  Graphviz and Dynagraph – Static and Dynamic Graph Drawing Tools , 2003 .

[13]  A. M. Michael,et al.  Irrigation, Theory and Practice , 1978 .

[14]  Carole A. Goble,et al.  Taverna: a tool for building and running workflows of services , 2006, Nucleic Acids Res..

[15]  Marta Mattoso,et al.  Data parallelism in bioinformatics workflows using Hydra , 2010, HPDC '10.

[16]  Sudha Ram,et al.  Who does what: Collaboration patterns in the wikipedia and their impact on data quality , 2009, International Conference on Wireless Information Technology and Systems.

[17]  Pierre-Antoine Champin,et al.  Semantic Representation of Provenance in Wikipedia , 2010, SWPM@ISWC.

[18]  Marta Mattoso,et al.  A Strategy for Provenance Gathering in Distributed Scientific Workflows , 2009, 2009 Congress on Services - I.

[19]  K. Breitman,et al.  Modeling Provenance for Semantic Desktop Applications , 2007 .

[20]  Ela Hunt,et al.  An object model and database for functional genomics , 2004, Bioinform..

[21]  Chris Mungall,et al.  A Chado case study: an ontology-based modular schema for representing genome-associated biological information , 2007, ISMB/ECCB.

[22]  Klaus R. Dittrich,et al.  Data Provenance: A Categorization of Existing Approaches , 2007, BTW.

[23]  Yogesh L. Simmhan,et al.  Provenance Information Model of Karma Version 3 , 2009, 2009 Congress on Services - I.

[24]  Amit P. Sheth,et al.  Provenir Ontology: Towards a Framework for eScience Provenance Management , 2009 .

[25]  Peter P. Chen,et al.  Active Conceptual Modeling of Learning , 2006, Lecture Notes in Computer Science.

[26]  Amit P. Sheth,et al.  Janus: From Workflows to Semantic Provenance and Linked Open Data , 2010, IPAW.

[27]  Nigel Shadbolt,et al.  Provenance in Linked Data Integration , 2010, LDSI@FIA.

[28]  Yolanda Gil,et al.  PROV-DM: The PROV Data Model , 2013 .

[29]  Amit P. Sheth,et al.  Ontology-Driven Provenance Management in eScience: An Application in Parasite Research , 2009, OTM Conferences.

[30]  Sudha Ram,et al.  Understanding the Semantics of Data Provenance to Support Active Conceptual Modeling , 2006, Active Conceptual Modeling of Learning.

[31]  James Cheney,et al.  Workshop on theory and practice of provenance event report , 2009, SGMD.

[32]  BMC Bioinformatics , 2005 .

[33]  Wang Chiew Tan,et al.  Research Problems in Data Provenance , 2004, IEEE Data Eng. Bull..

[34]  Paolo Missier,et al.  A PROV Encoding for Provenance Analysis Using Deductive Rules , 2012, IPAW.

[35]  J. Rothberg,et al.  The development and impact of 454 sequencing , 2008, Nature Biotechnology.