Stream Monitoring under the Time Warping Distance

The goal of this paper is to monitor numerical streams, and to find subsequences that are similar to a given query sequence, under the DTW (dynamic time warping) distance. Applications include word spotting, sensor pattern matching, and monitoring of bio-medical signals (e.g., EKG, ECG), and monitoring of environmental (seismic and volcanic) signals. DTW is a very popular distance measure, permitting accelerations and decelerations, and it has been studied for finite, stored sequence sets. However, in many applications such as network analysis and sensor monitoring, massive amounts of data arrive continuously and it is infeasible to save all the historical data. We propose SPRING, a novel algorithm that can solve the problem. We provide a theoretical analysis and prove that SPRING does not sacrifice accuracy, while it requires constant space and time per time-tick. These are dramatic improvements over the naive method. Our experiments on real and realistic data illustrate that SPRING does indeed detect the qualifying subsequences correctly and that it can offer dramatic improvements in speed over the naive implementation.

[1]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[2]  D. Mount Bioinformatics: Sequence and Genome Analysis , 2001 .

[3]  Christos Faloutsos,et al.  FTW: fast similarity search under the time warping distance , 2005, PODS.

[4]  Man Hon Wong,et al.  Efficient subsequence matching for sequences databases under time warping , 2003, Seventh International Database Engineering and Applications Symposium, 2003. Proceedings..

[5]  Wesley W. Chu,et al.  An index-based approach for similarity search supporting time warping in large sequence databases , 2001, Proceedings 17th International Conference on Data Engineering.

[6]  Johannes Gehrke,et al.  Mining data streams under block evolution , 2002, SKDD.

[7]  Dennis Shasha,et al.  Warping indexes with envelope transforms for query by humming , 2003, SIGMOD '03.

[8]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[9]  Jimeng Sun,et al.  Streaming Pattern Discovery in Multiple Time-Series , 2005, VLDB.

[10]  Dennis Shasha,et al.  StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time , 2002, VLDB.

[11]  Jyh-Shing Roger Jang,et al.  Hierarchical filtering method for content-based music retrieval via acoustic input , 2001, MULTIMEDIA '01.

[12]  Eamonn J. Keogh,et al.  Exact indexing of dynamic time warping , 2002, Knowledge and Information Systems.

[13]  Christos Faloutsos,et al.  BRAID: stream mining through group lag correlations , 2005, SIGMOD '05.

[14]  Katsushi Ikeuchi,et al.  Automatic modeling of a 3D city map from real-world video , 1999, MULTIMEDIA '99.

[15]  Keith D. Koper,et al.  Forensic seismology and the sinking of the Kursk , 2001 .

[16]  Christos Faloutsos,et al.  Adaptive, Hands-Off Stream Mining , 2003, VLDB.

[17]  Keith D. Koper,et al.  Reply [to “Comment on ‘Forensic seismology and the sinking of the Kursk’”] , 2001 .

[18]  Sudipto Guha,et al.  Clustering Data Streams: Theory and Practice , 2003, IEEE Trans. Knowl. Data Eng..

[19]  S. Muthukrishnan,et al.  Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries , 2001, VLDB.

[20]  Piotr Indyk,et al.  Identifying Representative Trends in Massive Time Series Data Sets Using Sketches , 2000, VLDB.

[21]  Satoshi Suzuki,et al.  Memory-Based Forecasting for Weather Image Patterns , 2000, AAAI/IAAI.