A Temporal Foundation for Continuous Queries over Data Streams

Despite the surge of research in continuous stream processing, there is still a semantical gap. In many cases, continuous queries are formulated in an enriched SQL-like query language without specifying the semantics of such a query precisely enough. To overcome this problem, we present a sound and well defined temporal operator algebra over data streams ensuring deterministic query results of continuous queries. In analogy to traditional database systems, we distinguish between a logical and physical operator algebra. While our logical operator algebra specifies the semantics of each operation in a descriptive way over temporal multisets, the physical operator algebra provides adequate implementations in form of stream-to-stream operators. We show that query plans built with either the logical or the physical algebra produce snapshot-equivalent results. Moreover, we introduce a rich set of transformation rules that forms a solid foundation for query optimization, one of the major research topics in the stream community. Examples throughout the paper motivate the applicability of our approach and illustrate the steps from query formulation to query execution.

[1]  Randy H. Katz,et al.  An extended relational algebra with control over duplicate elimination , 1982, PODS.

[2]  Franco P. Preparata,et al.  Plane-sweep algorithms for intersecting geometric figures , 1982, CACM.

[3]  Timos K. Sellis,et al.  Multiple-query optimization , 1988, TODS.

[4]  Arie Segev,et al.  A Framework for Query Optimization in Temporal Databases , 1990, SSDBM.

[5]  Joseph Albert,et al.  Algebraic Properties of Bag Data Types , 1991, VLDB.

[6]  Douglas B. Terry,et al.  Continuous queries over append-only databases , 1992, SIGMOD '92.

[7]  Richard R. Muntz,et al.  Stream Processing: Temporal Query Processing and Optimization , 1993, Temporal Databases.

[8]  Goetz Graefe,et al.  Query evaluation techniques for large databases , 1993, CSUR.

[9]  Abraham Silberschatz,et al.  View maintenance issues for the chronicle data model (extended abstract) , 1995, PODS.

[10]  Miron Livny,et al.  The Design and Implementation of a Sequence Database System , 1996, VLDB.

[11]  Helen J. Wang,et al.  Online aggregation , 1997, SIGMOD '97.

[12]  Curtis E. Dyreson,et al.  A Glossary of Time Granularity Concepts , 1997, Temporal Databases, Dagstuhl.

[13]  Andrew Heybey,et al.  Tribeca: A System for Managing Large Databases of Network Traffic , 1998, USENIX Annual Technical Conference.

[14]  Christian S. Jensen,et al.  Point-versus interval-based temporal data models , 1998, Proceedings 14th International Conference on Data Engineering.

[15]  Peter J. Haas,et al.  Ripple joins for online aggregation , 1999, SIGMOD '99.

[16]  Joseph M. Hellerstein,et al.  Eddies: continuously adaptive query processing , 2000, SIGMOD 2000.

[17]  Christian S. Jensen,et al.  Query plans for conventional and temporal queries involving duplicates and ordering , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[18]  Prasan Roy,et al.  Efficient and extensible algorithms for multi query optimization , 1999, SIGMOD '00.

[19]  Jennifer Widom,et al.  Database System Implementation , 2000 .

[20]  Christian S. Jensen,et al.  A Foundation for Conventional and Temporal Query Optimization Addressing Duplicates and Ordering , 2001, IEEE Trans. Knowl. Data Eng..

[21]  Bernhard Seeger,et al.  XXL - A Library Approach to Supporting Efficient Implementations of Advanced Database Queries , 2001, VLDB.

[22]  Bernhard Seeger,et al.  Progressive Merge Join: A Generic and Non-blocking Sort-based Join Algorithm , 2002, VLDB.

[23]  Jeffrey F. Naughton,et al.  Rate-based query optimization for streaming information sources , 2002, SIGMOD '02.

[24]  Jennifer Widom,et al.  An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations , 2002 .

[25]  Michael Stonebraker,et al.  Monitoring Streams - A New Class of Data Management Applications , 2002, VLDB.

[26]  David Toman,et al.  Logical data expiration , 2002, Proceedings Ninth International Symposium on Temporal Representation and Reasoning.

[27]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[28]  Carlo Zaniolo,et al.  ATLAS: A Small but Complete SQL Extension for Data Mining and Data Streams , 2003, VLDB.

[29]  B. Seeger,et al.  PIPES : A Multi-Threaded Publish-Subscribe Architecture for Continuous Queries over Streaming Data Sources , 2003 .

[30]  Rajeev Motwani,et al.  Chain: operator scheduling for memory minimization in data stream systems , 2003, SIGMOD '03.

[31]  Jeffrey F. Naughton,et al.  Evaluating window joins over unbounded streams , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[32]  Lukasz Golab,et al.  Processing Sliding Window Multi-Joins in Continuous Queries over Data Streams , 2003, VLDB.

[33]  Theodore Johnson,et al.  Gigascope: a stream database for network applications , 2003, SIGMOD '03.

[34]  Walid G. Aref,et al.  Efficient Execution of Sliding-Window Queries Over Data Streams , 2003 .

[35]  David Maier,et al.  Exploiting Punctuation Semantics in Continuous Data Streams , 2003, IEEE Trans. Knowl. Data Eng..

[36]  Lukasz Golab,et al.  Issues in data stream management , 2003, SGMD.

[37]  Frederick Reiss,et al.  TelegraphCQ: Continuous Dataflow Processing for an Uncertain World , 2003, CIDR.

[38]  Bernhard Seeger,et al.  PIPES: a public infrastructure for processing and exploring streams , 2004, SIGMOD '04.

[39]  Jennifer Widom,et al.  A denotational semantics for continuous queries over streams and relations , 2004, SGMD.

[40]  Jennifer Widom,et al.  Flexible time management in data stream systems , 2004, PODS.

[41]  Elke A. Rundensteiner,et al.  Dynamic plan migration for continuous queries over data streams , 2004, SIGMOD '04.

[42]  Jennifer Widom,et al.  The CQL continuous query language: semantic foundations and query execution , 2006, The VLDB Journal.

[43]  Qiang Chen,et al.  Aurora : a new model and architecture for data stream management ) , 2006 .