Automatic capture of provenance data in genome project workflows

Many scientific experiments are designed as computational workflows in the bioinformatics domain, which facilitates implementation and analysis. However, the amount of data generated increases at every phase of each execution, hindering the identification of the source and the data transformation. Therefore, it has become necessary to create new tools to verify automatically which resources and parameters were used to generate the results, among other information to validate and publish the experiment. This functionality of automatically capturing data provenance has been receiving attention in the scientific community, primarily with regard to bioinformatics projects, due the fact that the same workflow is executed several times with different parameters and versions of the tools. In this paper, we propose to use relational schema to automatically store data provenance using the PROV-DM model for workflows in bioinformatics projects.