论文信息 - Automatic capture of provenance data in genome project workflows

Automatic capture of provenance data in genome project workflows

Many scientific experiments are designed as computational workflows in the bioinformatics domain, which facilitates implementation and analysis. However, the amount of data generated increases at every phase of each execution, hindering the identification of the source and the data transformation. Therefore, it has become necessary to create new tools to verify automatically which resources and parameters were used to generate the results, among other information to validate and publish the experiment. This functionality of automatically capturing data provenance has been receiving attention in the scientific community, primarily with regard to bioinformatics projects, due the fact that the same workflow is executed several times with different parameters and versions of the tools. In this paper, we propose to use relational schema to automatically store data provenance using the PROV-DM model for workflows in bioinformatics projects.

Maristela Holanda | Aletéia Patrícia Favacho de Araújo | Sérgio Lifschitz | Maria Emilia Telles Walter | Rodrigo Pinheiro

[1] Wang Chiew Tan,et al. Research Problems in Data Provenance , 2004, IEEE Data Eng. Bull..

[2] Lincoln Stein,et al. Genome annotation: from sequence to biology , 2001, Nature Reviews Genetics.

[3] Pierre-Antoine Champin,et al. Semantic Representation of Provenance in Wikipedia , 2010, SWPM@ISWC.

[4] James Cheney,et al. Workshop on theory and practice of provenance event report , 2009, SGMD.

[5] Luc Moreau,et al. Provenance-Based Auditing of Private Data Use , 2008, BCS Int. Acad. Conf..

[6] Marta Mattoso,et al. A Strategy for Provenance Gathering in Distributed Scientific Workflows , 2009, 2009 Congress on Services - I.

[7] Klaus R. Dittrich,et al. Data Provenance: A Categorization of Existing Approaches , 2007, BTW.

[8] Sanjeev Khanna,et al. Why and Where: A Characterization of Data Provenance , 2001, ICDT.

[9] Maristela Holanda,et al. Managing data provenance in genome project workflows , 2012, 2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops.

[10] Alfonso Valencia,et al. Modern Genome Annotation: The Biosapiens Network , 2008 .