Fine-Grained Provenance Inference for a Large Processing Chain with Non-materialized Intermediate Views

Many applications facilitate a data processing chain, i.e. a workflow, to process data. Results of intermediate processing steps may not be persistent since reproducing these results are not costly and these are hardly re-usable. However, in stream data processing where data arrives continuously, documenting fine-grained provenance explicitly for a processing chain to reproduce results is not a feasible solution since the provenance data may become a multiple of the actual sensor data. In this paper, we propose the multi-step provenance inference technique that infers provenance data for the entire workflow with non-materialized intermediate views. Our solution provides high quality provenance graph.

[1]  Технология Springer Science+Business Media , 2013 .

[2]  Peter Buneman,et al.  Provenance in databases , 2009, SIGMOD '07.

[3]  Roger S. Barga,et al.  Automatic capture and efficient storage of e‐Science experiment provenance , 2008, Concurr. Comput. Pract. Exp..

[4]  Andreas Wombacher,et al.  Adaptive Inference of Fine-grained Data Provenance to Achieve High Accuracy at Lower Storage Costs , 2011, 2011 IEEE Seventh International Conference on eScience.

[5]  Jennifer Widom,et al.  Lineage tracing for general data warehouse transformations , 2003, The VLDB Journal.

[6]  John S. Heidemann,et al.  Provenance in Sensornet Republishing , 2008, IPAW.

[7]  Marianne Winslett,et al.  Scientific and Statistical Database Management, 21st International Conference, SSDBM 2009, New Orleans, LA, USA, June 2-4, 2009, Proceedings , 2009, SSDBM.

[8]  Andreas Wombacher,et al.  Inferring Fine-Grained Data Provenance in Stream Data Processing: Reduced Storage Cost, High Accuracy , 2011, DEXA.

[9]  PlaleBeth,et al.  A survey of data provenance in e-science , 2005 .

[10]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[11]  Yogesh L. Simmhan,et al.  Karma2: Provenance Management for Data-Driven Workflows , 2008, Int. J. Web Serv. Res..

[12]  Simon Miles Automatically Adapting Source Code to Document Provenance , 2010, IPAW.

[13]  Sanjeev Khanna,et al.  Why and Where: A Characterization of Data Provenance , 2001, ICDT.

[14]  Andreas Wombacher,et al.  Facilitating fine grained data provenance using temporal data model , 2010, DMSN '10.

[15]  Fayez Gebali,et al.  Analysis of Computer and Communication Networks , 2008 .

[16]  Wenfei Fan,et al.  Keys with Upward Wildcards for XML , 2001, DEXA.

[17]  Jennifer Widom,et al.  LIVE: A Lineage-Supported Versioned DBMS , 2010, SSDBM.

[18]  Andreas Wombacher,et al.  Data Workflow - A Workflow Model for Continuous Data Processing , 2010 .