Exploiting Punctuation Semantics in Continuous Data Streams

As most current query processing architectures are already pipelined, it seems logical to apply them to data streams. However, two classes of query operators are impractical for processing long or infinite data streams. Unbounded stateful operators maintain state with no upper bound in size and, so, run out of memory. Blocking operators read an entire input before emitting a single output and, so, might never produce a result. We believe that a priori knowledge of a data stream can permit the use of such operators in some cases. We discuss a kind of stream semantics called punctuated streams. Punctuations in a stream mark the end of substreams allowing us to view an infinite stream as a mixture of finite streams. We introduce three kinds of invariants to specify the proper behavior of operators in the presence of punctuation. Pass invariants define when results can be passed on. Keep invariants define what must be kept in local state to continue successful operation. Propagation invariants define when punctuation can be passed on. We report on our initial implementation and show a strategy for proving implementations of these invariants are faithful to their relational counterparts.

[1]  Joseph Albert,et al.  Algebraic Properties of Bag Data Types , 1991, VLDB.

[2]  Goetz Graefe,et al.  Query evaluation techniques for large databases , 1993, CSUR.

[3]  Jennifer Widom,et al.  Characterizing memory requirements for queries over continuous data streams , 2002, PODS '02.

[4]  Divesh Srivastava,et al.  On computing correlated aggregates over continual data streams , 2001, SIGMOD '01.

[5]  David J. DeWitt,et al.  Architecting a Network Query Engine for Producing Partial Results , 2000, WebDB.

[6]  Anne Rogers,et al.  Hancock: a language for extracting signatures from data streams , 2000, KDD '00.

[7]  Richard R. Muntz,et al.  ASPEN: A Stream Processing Environment , 1989, PARLE.

[8]  David J. DeWitt,et al.  The Niagara Internet Query System , 2001, IEEE Data Eng. Bull..

[9]  Andrew Heybey,et al.  Tribeca: A System for Managing Large Databases of Network Traffic , 1998, USENIX Annual Technical Conference.

[10]  D. Scott Parker,et al.  Stream data analysis in Prolog , 1990 .

[11]  Helen J. Wang,et al.  Online aggregation , 1997, SIGMOD '97.

[12]  Douglas Stott Parker,et al.  The Tangram stream query processing system , 1989, [1989] Proceedings. Fifth International Conference on Data Engineering.

[13]  Ioana Manolescu,et al.  The XML benchmark project , 2001 .

[14]  David Levine,et al.  Query processing of streamed XML data , 2002, CIKM '02.

[15]  Michael Stonebraker,et al.  Monitoring Streams - A New Class of Data Management Applications , 2002, VLDB.

[16]  Samuel Madden,et al.  Fjording the stream: an architecture for queries over streaming sensor data , 2002, Proceedings 18th International Conference on Data Engineering.

[17]  Paul Hudak The Haskell School of Expression: Learning Functional Programming through Multimedia , 1999 .

[18]  Samuel Madden,et al.  Continuously adaptive continuous queries over streams , 2002, SIGMOD '02.

[19]  Miron Livny,et al.  Sequence query processing , 1994, SIGMOD '94.

[20]  A. N. Wilschut,et al.  Dataflow query execution in a parallel main-memory environment , 1991, [1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems.

[21]  S. Muthukrishnan,et al.  Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries , 2001, VLDB.

[22]  Joseph M. Hellerstein,et al.  Eddies: continuously adaptive query processing , 2000, SIGMOD '00.

[23]  Jennifer Widom,et al.  Continuous queries over data streams , 2001, SGMD.