Simultaneous Equation Systems for Query Processing on Continuous-Time Data Streams

We introduce pulse, a framework for processing continuous queries over models of continuous-time data, which can compactly and accurately represent many real-world activities and processes. Pulse implements several query operators, including filters, aggregates and joins, that work by solving simultaneous equation systems, which in many cases is significantly cheaper than processing a stream of tuples. As such, pulse translates regular queries to work on continuous-time inputs, to reduce computational overhead and latency while meeting user-specified error bounds on query results. For error bound checking, pulse uses an approximate query inversion technique that ensures the solver executes infrequently and only in the presence of errors, or no previously known results. We first discuss the high-level design of pulse, which we fully implemented in a stream processing system. We then characterise pulse's behavior through experiments with real data, including financial data from the New York Stock Exchange, and spatial data from the Automatic Identification System for tracking naval vessels. Our results verify that Pulse is practical and demonstrates significant performance gains for a variety of workload and query types.

[1]  Brian Tetreault Automatic Identification System , 2006 .

[2]  Jimeng Sun,et al.  Streaming Pattern Discovery in Multiple Time-Series , 2005, VLDB.

[3]  Michael Stonebraker,et al.  Monitoring Streams - A New Class of Data Management Applications , 2002, VLDB.

[4]  Jennifer Widom,et al.  Adaptive filters for continuous queries over distributed data streams , 2003, SIGMOD '03.

[5]  Michel Scholl,et al.  The DEDALE Prototype , 2000, Constraint Databases.

[6]  Christos Faloutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[7]  D. Anderson,et al.  Algorithms for minimization without derivatives , 1974 .

[8]  Gabriel M. Kuper,et al.  Constraint Query Languages , 1995, J. Comput. Syst. Sci..

[9]  Eamonn J. Keogh,et al.  An online algorithm for segmenting time series , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[10]  Ugur Çetintemel,et al.  Declarative temporal data models for sensor-driven query processing , 2007, DMSN '07.

[11]  Hagit Shatkay,et al.  Approximate queries and representations for large data sequences , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[12]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[13]  Edward Y. Chang,et al.  Adaptive stream resource management using Kalman Filters , 2004, SIGMOD '04.

[14]  Stéphane Grumbach,et al.  Constraint Databases , 1999, JFPLC.

[15]  Samuel Madden,et al.  MauveDB: supporting model-based user views in database systems , 2006, SIGMOD Conference.

[16]  Dimitris Papadias,et al.  Aggregate nearest neighbor queries in road networks , 2005, IEEE Transactions on Knowledge and Data Engineering.

[17]  Leonore Neugebauer Optimization and evaluation of database queries including embedded interpolation procedures , 1991, SIGMOD '91.

[18]  Dimitris Papadias,et al.  Slot Index Spatial Join , 2003, IEEE Trans. Knowl. Data Eng..

[19]  Stéphane Grumbach,et al.  Manipulating Interpolated Data is Easier than You Thought , 2000, VLDB.

[20]  Amit P. Sheth,et al.  Semantic (Web) Technology In Action: Ontology Driven Information Systems for Search, Integration and Analysis , 2003, IEEE Data Eng. Bull..

[21]  Ling Lin,et al.  Querying Continuous Time Sequences , 1998, VLDB.

[22]  Walid G. Aref,et al.  Spatio-Temporal Access Methods , 2003, IEEE Data Eng. Bull..