Maintaining consistent results of continuous queries under diverse window specifications

Continuous queries applied over nonterminating data streams usually specify windows in order to obtain an evolving-yet restricted-set of tuples and thus provide timely and incremental results. Although sliding windows get frequently employed in many user requests, additional types like partitioned or landmark windows are also available in stream processing engines. In this paper, we set out to study the existence of monotonic-related semantics for a rich set of windowing constructs in order to facilitate a more efficient maintenance of their changing contents. After laying out a formal foundation for expressing windowed queries, we investigate update patterns observed in most common window variants as well as their impact on adaptations of typical operators (like windowed join, union or aggregation), thus offering more insight towards design and implementation of stream processing mechanisms. Furthermore, we identify syntactic equivalences in algebraic expressions involving windows, to the potential benefit of query optimizations. Finally, this framework is validated for several windowed operations against streaming datasets with simulations at diverse arrival rates and window specifications, providing concrete evidence of its significance.

[1]  JÜRGEN KRÄMER,et al.  Semantics and implementation of continuous sliding window queries over data streams , 2009, TODS.

[2]  Theodore Johnson,et al.  A Heartbeat Mechanism and Its Application in Gigascope , 2005, VLDB.

[3]  Jennifer Widom,et al.  Resource Sharing in Continuous Sliding-Window Aggregates , 2004, VLDB.

[4]  David Maier,et al.  Semantics and evaluation techniques for window aggregates in data streams , 2005, SIGMOD '05.

[5]  Walid G. Aref,et al.  Incremental Evaluation of Sliding-Window Queries over Data Streams , 2007 .

[6]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[7]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[8]  Lukasz Golab,et al.  Update-pattern-aware modeling and processing of continuous queries , 2005, SIGMOD '05.

[9]  Tim Kraska,et al.  Extending XQuery with Window Functions , 2007, VLDB.

[10]  Timos K. Sellis,et al.  Window Update Patterns in Stream Operators , 2009, ADBIS.

[11]  David Maier,et al.  Semantics of Data Streams and Operators , 2005, ICDT.

[12]  David Maier,et al.  Using Punctuation Schemes to Characterize Strategies for Querying over Data Streams , 2007, IEEE Transactions on Knowledge and Data Engineering.

[13]  Timos K. Sellis,et al.  Window Specification over Data Streams , 2006, EDBT Workshops.

[14]  Updated June Oracle Complex Event Processing: Lightweight Modular Application Event Stream Processing in the Real World , 2009 .

[15]  Jim Melton,et al.  Advanced SQL:1999: Understanding Object-Relational and Other Advanced Features , 2002 .

[16]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[17]  Frederick Reiss,et al.  TelegraphCQ: Continuous Dataflow Processing for an Uncertain World , 2003, CIDR.

[18]  Michael Stonebraker,et al.  Linear Road: A Stream Data Management Benchmark , 2004, VLDB.

[19]  Gustavo Alonso,et al.  Flexible and scalable storage management for data-intensive stream processing , 2009, EDBT '09.

[20]  Michael Stonebraker,et al.  The 8 requirements of real-time stream processing , 2005, SGMD.

[21]  Douglas B. Terry,et al.  Continuous queries over append-only databases , 1992, SIGMOD '92.

[22]  Daniel Barbará,et al.  The Characterization of Continuous Queries , 1999, Int. J. Cooperative Inf. Syst..

[23]  Walid G. Aref,et al.  Exploiting predicate-window semantics over data streams , 2006, SGMD.

[24]  David Maier,et al.  Exploiting Punctuation Semantics in Continuous Data Streams , 2003, IEEE Trans. Knowl. Data Eng..

[25]  Martin Kersten,et al.  Exploiting the power of relational databases for efficient stream processing , 2009, EDBT '09.

[26]  Christian S. Jensen,et al.  Expiration Times for Data Management , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[27]  Jennifer Widom,et al.  The CQL continuous query language: semantic foundations and query execution , 2006, The VLDB Journal.

[28]  Jennifer Widom,et al.  Towards a streaming SQL standard , 2008, Proc. VLDB Endow..

[29]  Frederick Reiss,et al.  TelegraphCQ: continuous dataflow processing , 2003, SIGMOD '03.

[30]  Qiang Chen,et al.  Aurora : a new model and architecture for data stream management ) , 2006 .