The CQL continuous query language: semantic foundations and query execution

CQL, a continuous query language, is supported by the STREAM prototype data stream management system (DSMS) at Stanford. CQL is an expressive SQL-based declarative language for registering continuous queries against streams and stored relations. We begin by presenting an abstract semantics that relies only on “black-box” mappings among streams and relations. From these mappings we define a precise and general interpretation for continuous queries. CQL is an instantiation of our abstract semantics using SQL to map from relations to relations, window specifications derived from SQL-99 to map from streams to relations, and three new operators to map from relations to streams. Most of the CQL language is operational in the STREAM system. We present the structure of CQL's query execution plans as well as details of the most important components: operators, interoperator queues, synopses, and sharing of components among multiple operators and queries. Examples throughout the paper are drawn from the Linear Road benchmark recently proposed for DSMSs. We also curate a public repository of data stream applications that includes a wide variety of queries expressed in CQL. The relative ease of capturing these applications in CQL is one indicator that the language contains an appropriate set of constructs for data stream processing.

[1]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[2]  Jeffrey Scott Vitter,et al.  Random sampling with a reservoir , 1985, TOMS.

[3]  Editors , 1986, Brain Research Bulletin.

[4]  GoldbergDavid,et al.  Continuous queries over append-only databases , 1992 .

[5]  Douglas B. Terry,et al.  Continuous queries over append-only databases , 1992, SIGMOD '92.

[6]  V. S. Subrahmanian,et al.  Maintaining views incrementally , 1993, SIGMOD Conference.

[7]  Jennifer Widom,et al.  Active Database Systems: Triggers and Rules For Advanced Database Processing , 1994 .

[8]  Inderpal Singh Mumick,et al.  The Stanford Data Warehousing Project , 1995 .

[9]  Miron Livny,et al.  SEQ: A model for sequence databases , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[10]  Abraham Silberschatz,et al.  View maintenance issues for the chronicle data model (extended abstract) , 1995, PODS.

[11]  Gultekin Özsoyoglu,et al.  Temporal and Real-Time Databases: A Survey , 1995, IEEE Trans. Knowl. Data Eng..

[12]  Mark Sullivan,et al.  Tribeca: A Stream Database Manager for Network Traffic Analysis , 1996, VLDB.

[13]  Active database systems , 1999, CSUR.

[14]  Calton Pu,et al.  Continual Queries for Internet Scale Event-Driven Information Delivery , 1999, IEEE Trans. Knowl. Data Eng..

[15]  Daniel Barbará,et al.  The Characterization of Continuous Queries , 1999, Int. J. Cooperative Inf. Syst..

[16]  David J. DeWitt,et al.  NiagaraCQ: a scalable continuous query system for Internet databases , 2000, SIGMOD '00.

[17]  Serge Abiteboul,et al.  Monitoring XML data on the Web , 2001, SIGMOD '01.

[18]  Carlo Zaniolo,et al.  ATLaS: a Turing-Complete Extension of SQL for Data Mining Applications and Streams , 2002 .

[19]  Rajeev Rastogi,et al.  Processing complex aggregate queries over data streams , 2002, SIGMOD '02.

[20]  Samuel Madden,et al.  Continuously adaptive continuous queries over streams , 2002, SIGMOD '02.

[21]  Michael J. Franklin,et al.  Streaming Queries over Streaming Data , 2002, VLDB.

[22]  Michael Stonebraker,et al.  Monitoring Streams - A New Class of Data Management Applications , 2002, VLDB.

[23]  Jennifer Widom,et al.  Characterizing memory requirements for queries over continuous data streams , 2002, PODS '02.

[24]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[25]  Carlo Zaniolo,et al.  ATLAS: A Small but Complete SQL Extension for Data Mining and Data Streams , 2003, VLDB.

[26]  Michael Stonebraker,et al.  Aurora: a data stream management system , 2003, SIGMOD '03.

[27]  R. Motwani,et al.  Query Processing, Approximation, and Resource Management in a Data Stream Management System , 2003, CIDR.

[28]  Carlo Zaniolo,et al.  ATLaS: A Native Extension of SQL for Data Mining , 2003, SDM.

[29]  Rajeev Motwani,et al.  Chain: operator scheduling for memory minimization in data stream systems , 2003, SIGMOD '03.

[30]  Michael Stonebraker,et al.  Load Shedding in a Data Stream Manager , 2003, VLDB.

[31]  Frederick Reiss,et al.  TelegraphCQ: continuous dataflow processing , 2003, SIGMOD '03.

[32]  Jennifer Widom,et al.  STREAM: the stanford stream data manager (demonstration description) , 2003, SIGMOD '03.

[33]  Jennifer Widom,et al.  CQL: A Language for Continuous Queries over Streams and Relations , 2003, DBPL.

[34]  Michael Stonebraker,et al.  Operator Scheduling in a Data Stream Manager , 2003, VLDB.

[35]  Theodore Johnson,et al.  Gigascope: a stream database for network applications , 2003, SIGMOD '03.

[36]  Jeffrey F. Naughton,et al.  Maximizing the Output Rate of Multi-Way Join Queries over Streaming Information Sources , 2003, VLDB.

[37]  Jennifer Widom,et al.  STREAM: The Stanford Stream Data Manager , 2003, IEEE Data Eng. Bull..

[38]  Lukasz Golab,et al.  Issues in data stream management , 2003, SGMD.

[39]  Abhinandan Das,et al.  Approximate join processing over data streams , 2003, SIGMOD '03.

[40]  Walid G. Aref,et al.  Scheduling for shared window joins over data streams , 2003, VLDB.

[41]  Frederick Reiss,et al.  TelegraphCQ: Continuous Dataflow Processing for an Uncertain World , 2003, CIDR.

[42]  Michael Stonebraker,et al.  Linear Road: A Stream Data Management Benchmark , 2004, VLDB.

[43]  Jennifer Widom,et al.  Resource Sharing in Continuous Sliding-Window Aggregates , 2004, VLDB.

[44]  Carlo Zaniolo,et al.  Query Languages and Data Models for Database Sequences and Data Streams , 2004, VLDB.

[45]  Rajeev Motwani,et al.  Load shedding for aggregation queries over data streams , 2004, Proceedings. 20th International Conference on Data Engineering.

[46]  Jennifer Widom,et al.  Adaptive ordering of pipelined stream filters , 2004, SIGMOD '04.

[47]  Jennifer Widom,et al.  StreaMon: an adaptive engine for stream query processing , 2004, SIGMOD '04.

[48]  Jennifer Widom,et al.  Memory-Limited Execution of Windowed Stream Joins , 2004, VLDB.

[49]  Jennifer Widom,et al.  A denotational semantics for continuous queries over streams and relations , 2004, SGMD.

[50]  Jennifer Widom,et al.  Flexible time management in data stream systems , 2004, PODS.

[51]  Jennifer Widom,et al.  Exploiting k-constraints to reduce memory overhead in continuous queries over data streams , 2004, TODS.

[52]  Jennifer Widom,et al.  Adaptive caching for continuous queries , 2005, 21st International Conference on Data Engineering (ICDE'05).