Continually evaluating similarity-based pattern queries on a streaming time series

In many applications, local or remote sensors send in streams of data, and the system needs to monitor the streams to discover relevant events/patterns and deliver instant reaction correspondingly. An important scenario is that the incoming stream is a continually appended time series, and the patterns are time series in a database. At each time when a new value arrives (called a time position), the system needs to find, from the database, the nearest or near neighbors of the incoming time series up to the time position. This paper attacks the problem by using Fast Fourier Transform (FFT) to efficiently find the cross correlations of time series, which yields, in a batch mode, the nearest and near neighbors of the incoming time series at many time positions. To take advantage of this batch processing in achieving fast response time, this paper uses prediction methods to predict future values. FFT is used to compute the cross correlations of the predicted series (with the values that have already arrived) and the database patterns, and to obtain predicted distances between the incoming time series at many future time positions and the database patterns. When the actual data value arrives, the prediction error together with the predicted distances is used to filter out patterns that are not possible to be the nearest or near neighbors, which provides fast responses. Experiments show that with reasonable prediction errors, the performance gain is significant.

[1]  Alberto O. Mendelzon,et al.  Querying Time Series Data Based on Similarity , 2000, IEEE Trans. Knowl. Data Eng..

[2]  Amir B. Geva,et al.  A new algorithm for time series prediction by temporal fuzzy clustering , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[3]  Ambuj K. Singh,et al.  Variable length queries for time series data , 2001, Proceedings 17th International Conference on Data Engineering.

[4]  Calton Pu,et al.  Differential evaluation of continual queries , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.

[5]  Samuel Madden,et al.  Fjording the stream: an architecture for queries over streaming sensor data , 2002, Proceedings 18th International Conference on Data Engineering.

[6]  Alan V. Oppenheim,et al.  Digital Signal Processing , 1978, IEEE Transactions on Systems, Man, and Cybernetics.

[7]  Douglas B. Terry,et al.  Continuous queries over append-only databases , 1992, SIGMOD '92.

[8]  Stefan Berchtold,et al.  High-dimensional index structures database support for next decade's applications (tutorial) , 1998, SIGMOD '98.

[9]  Alberto O. Mendelzon,et al.  Similarity-based queries for time series data , 1997, SIGMOD '97.

[10]  David J. DeWitt,et al.  NiagaraCQ: a scalable continuous query system for Internet databases , 2000, SIGMOD '00.

[11]  Haixun Wang,et al.  Landmarks: a new model for similarity-based pattern querying in time series databases , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[12]  Jennifer Widom,et al.  Continuous queries over data streams , 2001, SGMD.

[13]  Intaek Kim,et al.  A fuzzy time series prediction method based on consecutive values , 1999, FUZZ-IEEE'99. 1999 IEEE International Fuzzy Systems. Conference Proceedings (Cat. No.99CH36315).

[14]  Man Hon Wong,et al.  Fast time-series searching with scaling and shifting , 1999, PODS '99.

[15]  Douglas Stott Parker,et al.  The Tangram stream query processing system , 1989, [1989] Proceedings. Fifth International Conference on Data Engineering.

[16]  David J. DeWitt,et al.  Design and evaluation of alternative selection placement strategies in optimizing continuous queries , 2002, Proceedings 18th International Conference on Data Engineering.

[17]  Johan A. K. Suykens,et al.  Financial time series prediction using least squares support vector machines within the evidence framework , 2001, IEEE Trans. Neural Networks.

[18]  A. Poularikas The transforms and applications handbook , 2000 .

[19]  Kyuseok Shim,et al.  Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases , 1995, VLDB.

[20]  Eamonn J. Keogh,et al.  Locally adaptive dimensionality reduction for indexing large time series databases , 2001, SIGMOD '01.

[21]  Hagit Shatkay,et al.  Approximate queries and representations for large data sequences , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[22]  Calton Pu,et al.  Continual Queries for Internet Scale Event-Driven Information Delivery , 1999, IEEE Trans. Knowl. Data Eng..

[23]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[24]  C. Burrus,et al.  DFT/FFT and Convolution Algorithms: Theory and Implementation , 1991 .

[25]  László Györfi,et al.  A simple randomized algorithm for sequential prediction of ergodic time series , 1999, IEEE Trans. Inf. Theory.

[26]  Karsten Schwan,et al.  Optimizations enabled by a relational data model view to querying data streams , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[27]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[28]  Zhiping Lin,et al.  Predicting time series with wavelet packet neural networks , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[29]  Philip S. Yu,et al.  Adaptive query processing for time-series data , 1999, KDD '99.