Enhanced stream processing in a DBMS kernel

Continuous query processing has emerged as a promising query processing paradigm with numerous applications. A recent development is the need to handle both streaming queries and typical one-time queries in the same application. For example, data warehousing can greatly benefit from the integration of stream semantics, i.e., online analysis of incoming data and combination with existing data. This is especially useful to provide low latency in data-intensive analysis in big data warehouses that are augmented with new data on a daily basis. However, state-of-the-art database technology cannot handle streams efficiently due to their "continuous" nature. At the same time, state-of-the-art stream technology is purely focused on stream applications. The research efforts are mostly geared towards the creation of specialized stream management systems built with a different philosophy than a DBMS. The drawback of this approach is the limited opportunities to exploit successful past data processing technology, e.g., query optimization techniques. For this new problem we need to combine the best of both worlds. Here we take a completely different route by designing a stream engine on top of an existing relational database kernel. This includes reuse of both its storage/execution engine and its optimizer infrastructure. The major challenge then becomes the efficient support for specialized stream features. This paper focuses on incremental window-based processing, arguably the most crucial streamspecific requirement. In order to maintain and reuse the generic storage and execution model of the DBMS, we elevate the problem at the query plan level. Proper optimizer rules, scheduling and intermediate result caching and reuse, allow us to modify the DBMS query plans for efficient incremental processing. We describe in detail the new approach and we demonstrate efficient performance even against specialized stream engines, especially when scalability becomes a crucial factor.

[1]  Ryan Newton,et al.  XStream: a Signal-Oriented Data Stream Management System , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[2]  Qiming Chen,et al.  Experience in Extending Query Engine for Continuous Analytics , 2010, DaWak.

[3]  Gustavo Alonso,et al.  Flexible and scalable storage management for data-intensive stream processing , 2009, EDBT '09.

[4]  Theodore Johnson,et al.  Gigascope: a stream database for network applications , 2003, SIGMOD '03.

[5]  David Maier,et al.  No pane, no gain: efficient evaluation of sliding-window aggregates over data streams , 2005, SGMD.

[6]  Ryan Newton,et al.  The Case for a Signal-Oriented Data Stream Management System , 2007, CIDR.

[7]  Jeffrey F. Naughton,et al.  Evaluating window joins over unbounded streams , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[8]  Martin L. Kersten,et al.  An architecture for recycling intermediates in a column-store , 2009, SIGMOD Conference.

[9]  David J. DeWitt,et al.  Materialization Strategies in a Column-Oriented DBMS , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[10]  Martin L. Kersten,et al.  Self-organizing tuple reconstruction in column-stores , 2009, SIGMOD Conference.

[11]  Henrik Loeser,et al.  "One Size Fits All": An Idea Whose Time Has Come and Gone? , 2011, BTW.

[12]  Kirk Pruhs,et al.  Algorithms and metrics for processing multiple heterogeneous continuous queries , 2008, TODS.

[13]  Richard Winter,et al.  Large scale data warehousing: Trends and observations , 2010, ICDE.

[14]  James L. Peterson,et al.  Petri Nets , 1977, CSUR.

[15]  Michael J. Franklin,et al.  Continuous Analytics: Rethinking Query Processing in a Network-Effect World , 2009, CIDR.

[16]  Hamid Pirahesh,et al.  Alert: An Architecture for Transforming a Passive DBMS into an Active DBMS , 1991, VLDB.

[17]  Dennis Shasha,et al.  StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time , 2002, VLDB.

[18]  David J. DeWitt,et al.  NiagaraCQ: a scalable continuous query system for Internet databases , 2000, SIGMOD '00.

[19]  Lukasz Golab,et al.  Sliding Window Query Processing over Data Streams , 2006 .

[20]  Jae-Gil Lee,et al.  Continuous query processing in data streams using duality of data and queries , 2006, SIGMOD Conference.

[21]  Ying Li,et al.  Microsoft CEP Server and Online Behavioral Targeting , 2009, Proc. VLDB Endow..

[22]  Rajeev Motwani,et al.  Operator scheduling in data stream systems , 2004, VLDB 2004.

[23]  Frederick Reiss,et al.  TelegraphCQ: Continuous Dataflow Processing for an Uncertain World , 2003, CIDR.

[24]  Jennifer Widom,et al.  STREAM: the stanford stream data manager (demonstration description) , 2003, SIGMOD '03.

[25]  Frank Wm. Tompa,et al.  Efficiently updating materialized views , 1986, SIGMOD '86.

[26]  Martin L. Kersten,et al.  MonetDB/DataCell: Online Analytics in a Streaming Column-Store , 2012, Proc. VLDB Endow..

[27]  Leonid Libkin,et al.  Incremental maintenance of views with duplicates , 1995, SIGMOD '95.

[28]  Jennifer Widom,et al.  STREAM: The Stanford Stream Data Manager , 2003, IEEE Data Eng. Bull..

[29]  Rajeev Rastogi,et al.  Processing complex aggregate queries over data streams , 2002, SIGMOD '02.

[30]  Michael Stonebraker,et al.  Retrospective on Aurora , 2004, The VLDB Journal.

[31]  Philip A. Pinto,et al.  The Large Synoptic Survey Telescope , 2006 .

[32]  Joseph M. Hellerstein,et al.  Lifting the Burden of History from Adaptive Query Processing , 2004, VLDB.

[33]  Jin Zhang,et al.  A demonstration of the MaxStream federated stream processing system , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[34]  Martin Kersten,et al.  Exploiting the power of relational databases for efficient stream processing , 2009, EDBT '09.

[35]  Walid G. Aref,et al.  Incremental Evaluation of Sliding-Window Queries over Data Streams , 2007, IEEE Transactions on Knowledge and Data Engineering.