Policy-Based Integration of Provenance Metadata

Reproducibility has been a cornerstone of the scientific method for hundreds of years. The range of sources from which data now originates, the diversity of the individual manipulations performed, and the complexity of the orchestrations of these operations all limit the reproducibility that a scientist can ensure solely by manually recording their actions. We use an architecture where aggregation, fusion, and composition policies define how provenance records can be automatically merged to facilitate the analysis and reproducibility of experiments. We show that the overhead of collecting and storing provenance metadata can vary dramatically depending on the policy used to integrate it.

[1]  Yogesh L. Simmhan,et al.  The Open Provenance Model core specification (v1.1) , 2011, Future Gener. Comput. Syst..

[2]  Kiran-Kumar Muniswamy-Reddy,et al.  Causality-based versioning , 2009, TOS.

[3]  Margo I. Seltzer,et al.  Provenance-Aware Storage Systems , 2006, USENIX ATC, General Track.

[4]  Ashish Gehani,et al.  SPADE: Support for Provenance Auditing in Distributed Environments , 2012, Middleware.

[5]  Adam Arbree,et al.  Mapping Abstract Complex Workflows onto Grid Environments , 2003, Journal of Grid Computing.

[6]  Victoria Stodden,et al.  Reproducible Research , 2019, The New Statistics with R.

[7]  James Frew,et al.  Earth System Science Workbench: a data management infrastructure for earth science products , 2001, Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001.

[8]  Margo I. Seltzer,et al.  Towards Query Interoperability: PASSing PLUS , 2010, TaPP.

[9]  Giovanni Pirrotta,et al.  A Kernel Model with Conditional Moving Windows for the Prediction of Transmembrane Helices in Proteins , 2012 .

[10]  Shmuel Pietrokovski,et al.  Blocks+: a non-redundant database of protein alignment blocks derived from multiple compilations , 1999, Bioinform..

[11]  Kaizar Amin,et al.  Metadata in the Collaboratory for Multi-Scale Chemical Science , 2003, Dublin Core Conference.

[12]  Carole A. Goble,et al.  Semantically Linking and Browsing Provenance Logs for E-science , 2004, ICSNW.

[13]  M L Perl Essay: the tau lepton and thirty years of changes in elementary particle physics research. , 2008, Physical review letters.

[14]  Cláudio T. Silva,et al.  Provenance for Computational Tasks: A Survey , 2008, Computing in Science & Engineering.

[15]  Norman W. Paton,et al.  Contextualised Workflow Execution in MyGrid , 2005, EGC.

[16]  W. Marsden I and J , 2012 .

[17]  P. Hegde,et al.  The Institute for Genomic Research , 1998, Current Biology.

[18]  G. Montelione,et al.  Contributions to the NIH-NIGMS Protein Structure Initiative from the PSI Production Centers. , 2008, Structure.

[19]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..