Transactional stream processing

Many stream processing applications require access to a multitude of streaming as well as stored data sources. Yet there is no clear semantics for correct continuous query execution over these data sources in the face of concurrent access and failures. Instead, today's Stream Processing Systems (SPSs) hard-code transactional concepts in their execution models, making them both hard to understand and inflexible to use. In this paper, we show that we can successfully reuse the traditional transactional theory (with some minimal extensions) in order to cleanly define the correct interaction of a set of continuous and one-time queries concurrently accessing both streaming and stored data sources. The result is a unified transactional model (UTM) for query processing over streams as well as traditional databases. We present a transaction manager that implements this model on top of an existing storage manager for streams (MXQuery/SMS). Experiments on the Linear Road Benchmark show that our transaction manager flexibly ensures correctness in case of concurrency and failures, without sacrificing from performance. Moreover, this model is powerful enough to express the implicit transactional behaviors of a representative set of state-of-the-art SPSs.

[1]  Ragnar Normann,et al.  A theoretical study of 'Snapshot Isolation' , 2010, ICDT '10.

[2]  Inderpal Singh Mumick,et al.  The Stanford Data Warehousing Project , 1995 .

[3]  Lukasz Golab,et al.  On Concurrency Control in Sliding Window Queries over Data Streams , 2006, EDBT.

[4]  Jeffrey Davis,et al.  Continuous analytics over discontinuous streams , 2010, SIGMOD Conference.

[5]  Marc H. Scholl,et al.  Transactional information systems: theory, algorithms, and the practice of concurrency control and recovery , 2001, SGMD.

[6]  Michael Stonebraker,et al.  High-availability algorithms for distributed stream processing , 2005, 21st International Conference on Data Engineering (ICDE'05).

[7]  Michael J. Franklin,et al.  Continuous Analytics: Rethinking Query Processing in a Network-Effect World , 2009, CIDR.

[8]  Renée J. Miller,et al.  Stream schema: providing and exploiting static metadata for data stream processing , 2010, EDBT '10.

[9]  Stanley B. Zdonik,et al.  Revision Processing in a Stream Processing Engine: A High-Level Design , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[10]  Ying Xing,et al.  A Cooperative, Self-Configuring High-Availability Solution for Stream Processing , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[11]  Laura M. Haas,et al.  Modeling the execution semantics of stream processing engines with SECRET , 2012, The VLDB Journal.

[12]  Sushil Jajodia,et al.  Temporal Databases: Theory, Design, and Implementation , 1993 .

[13]  Michael Stonebraker,et al.  Linear Road: A Stream Data Management Benchmark , 2004, VLDB.

[14]  Gottfried Vossen,et al.  Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery , 2002 .

[15]  David Maier,et al.  Semantics of Data Streams and Operators , 2005, ICDT.

[16]  Rada Chirkova,et al.  Materialized Views , 2012, Found. Trends Databases.

[17]  Jim Gray,et al.  A critique of ANSI SQL isolation levels , 1995, SIGMOD '95.

[18]  Abraham Silberschatz,et al.  View maintenance issues for the chronicle data model (extended abstract) , 1995, PODS.

[19]  Tim Kraska,et al.  Extending XQuery with Window Functions , 2007, VLDB.

[20]  Douglas B. Terry,et al.  Continuous queries over append-only databases , 1992, SIGMOD '92.

[21]  Laura M. Haas,et al.  SECRET: A Model for Analysis of the Execution Semantics of Stream Processing Systems , 2010, Proc. VLDB Endow..

[22]  Ashish Gupta,et al.  Materialized views: techniques, implementations, and applications , 1999 .

[23]  Gustavo Alonso,et al.  Flexible and scalable storage management for data-intensive stream processing , 2009, EDBT '09.

[24]  Lukasz Golab,et al.  Update-pattern-aware modeling and processing of continuous queries , 2005, SIGMOD '05.

[25]  Elke A. Rundensteiner,et al.  Active Complex Event Processing over Event Streams , 2011, Proc. VLDB Endow..

[26]  Yue Zhuge,et al.  The Strobe algorithms for multi-source warehouse consistency , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[27]  Jonathan Goldstein,et al.  Consistent Streaming Through Time: A Vision for Event Stream Processing , 2006, CIDR.

[28]  Elke A. Rundensteiner,et al.  A Transactional Model for Data Warehouse Maintenance , 2002, ER.

[29]  Jennifer Widom,et al.  The CQL continuous query language: semantic foundations and query execution , 2006, The VLDB Journal.