From scripts towards provenance inference

Scientists require provenance information either to validate their model or to investigate the origin of an unexpected value. However, they do not maintain any provenance information and even designing the processing workflow is rare in practice. Therefore, in this paper, we propose a solution that can build the workflow provenance graph by interpreting the scripts used for actual processing. Further, scientists can request fine-grained provenance information facilitating the inferred workflow provenance. We also provide a guideline to customize the workflow provenance graph based on user preferences. Our evaluation shows that the proposed approach is relevant and suitable for scientists to manage provenance.

[1]  Rolf Weingartner,et al.  Global monthly water stress: 2. Water demand and severity of water stress , 2011 .

[2]  Andreas Wombacher,et al.  Fine-Grained Provenance Inference for a Large Processing Chain with Non-materialized Intermediate Views , 2012, SSDBM.

[3]  Yogesh L. Simmhan,et al.  A survey of data provenance in e-science , 2005, SGMD.

[4]  M. Bierkens,et al.  Global monthly water stress: 1. Water balance and water availability , 2011 .

[5]  John S. Heidemann,et al.  Provenance in Sensornet Republishing , 2008, IPAW.

[6]  Mark H. Ellisman,et al.  Data-intensive e-science frontier research , 2003, CACM.

[7]  Roger S. Barga,et al.  Automatic capture and efficient storage of e‐Science experiment provenance , 2008, Concurr. Comput. Pract. Exp..

[8]  Petra Döll,et al.  Quantifying blue and green virtual water contents in global crop production as well as potential production losses without irrigation , 2010 .

[9]  Andreas Wombacher,et al.  Adaptive Inference of Fine-grained Data Provenance to Achieve High Accuracy at Lower Storage Costs , 2011, 2011 IEEE Seventh International Conference on eScience.

[10]  Bertram Ludäscher,et al.  CONCURRENCY AND COMPUTATION : PRACTICE AND EXPERIENCE Concurrency Computat , 2008 .

[11]  Peter Buneman,et al.  Provenance in databases , 2009, SIGMOD '07.

[12]  Jennifer Widom,et al.  LIVE: A Lineage-Supported Versioned DBMS , 2010, SSDBM.

[13]  Andreas Wombacher,et al.  Data Workflow - A Workflow Model for Continuous Data Processing , 2010 .

[14]  Paul T. Groth,et al.  Automatic Metadata Annotation through Reconstructing Provenance , 2012, SWPM@ESWC.

[15]  Simon Miles Automatically Adapting Source Code to Document Provenance , 2010, IPAW.

[16]  P. Döll,et al.  MIRCA2000—Global monthly irrigated and rainfed crop areas around the year 2000: A new high‐resolution data set for agricultural and hydrological modeling , 2010 .

[17]  Andreas Wombacher,et al.  Facilitating fine grained data provenance using temporal data model , 2010, DMSN '10.

[18]  Andreas Wombacher,et al.  Inferring Fine-Grained Data Provenance in Stream Data Processing: Reduced Storage Cost, High Accuracy , 2011, DEXA.