Efficient Lineage Management on Stream Processing Engines

In this paper, we present a method to provide persistence to provenance on stream processing environment. In order to meet the requirement of applications, we provide persistence to provinces of the output of SPE. We implemented our proposal and evaluated it on experiments. The result of experiments showed that the provenance transferring cost is dependent to the location of operator tree. In case selectivity is 0 (empty result), the transferring should be conducted at the root of the operator tree. In case selectivity is 1 (full result) and operator tree includes multi-way cartesian products, then the transferring should be conducted at the leaves of the operator tree.

[1]  Jennifer Widom,et al.  Exploiting Lineage for Confidence Computation in Uncertain and Probabilistic Databases , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[2]  Michael Stonebraker,et al.  High-availability algorithms for distributed stream processing , 2005, 21st International Conference on Data Engineering (ICDE'05).

[3]  Johannes Gehrke,et al.  Cayuga: a high-performance event processing engine , 2007, SIGMOD '07.

[4]  N. Immerman,et al.  SASE + : An Agile Language for Kleene Closure over Event Streams , 2007 .

[5]  Ryan Newton,et al.  The Case for a Signal-Oriented Data Stream Management System , 2007, CIDR.

[6]  Adriane Chapman,et al.  Efficient provenance storage , 2008, SIGMOD Conference.

[7]  Qiang Chen,et al.  Aurora : a new model and architecture for data stream management ) , 2006 .

[8]  Frederick Reiss,et al.  TelegraphCQ: Continuous Dataflow Processing for an Uncertain World , 2003, CIDR.

[9]  Hideyuki Kawashima,et al.  Providing Persistence for Sensor Data Streams by Remote WAL , 2006, DaWaK.

[10]  Jennifer Widom,et al.  STREAM: The Stanford Stream Data Manager , 2003, IEEE Data Eng. Bull..

[11]  Jennifer Widom,et al.  ULDBs: databases with uncertainty and lineage , 2006, VLDB.

[12]  Jennifer Widom,et al.  The CQL continuous query language: semantic foundations and query execution , 2006, The VLDB Journal.

[13]  Bertram Ludäscher,et al.  A Model for User-Oriented Data Provenance in Pipelined Scientific Workflows , 2006, IPAW.

[14]  Chaki Ng,et al.  Provenance-Aware Sensor Data Storage , 2005, 21st International Conference on Data Engineering Workshops (ICDEW'05).

[15]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[16]  Jennifer Widom,et al.  Tracing the lineage of view data in a warehousing environment , 2000, TODS.

[17]  Lori A. Clarke,et al.  Experience in using a process language to define scientific workflow and generate dataset provenance , 2008, SIGSOFT '08/FSE-16.