Shape-based retrieval of similar subsequences in time-series databases

This paper deals with the problem of shape-based retrieval in time-series databases. The shape-based retrieval is defined as the operation that searches for the (sub)sequences whose shapes are similar to that of a given query sequence. In this paper, we propose an effective and efficient approach for shape-based retrieval of subsequences. We first introduce a new similarity model for shape-based retrieval that supports various combinations of transformations such as shifting, scaling, moving average, and time warping. For efficient processing of the shape-based retrieval, we also propose the indexing and query processing methods. To verify the superiority of our approach, we perform extensive experiments with the real-world S&P 500 stock data. The results reveal that our approach successfully finds all the subsequences that have the shapes similar to that of the query sequence, and also achieves significant speedup over the sequential scan method.

[1]  Sang-Wook Kim,et al.  Index interpolation: an approach to subsequence matching supporting normalization transform in time-series databases , 2000, CIKM '00.

[2]  Wesley W. Chu,et al.  An index-based approach for similarity search supporting time warping in large sequence databases , 2001, Proceedings 17th International Conference on Data Engineering.

[3]  W. K. Loh Index Interpolation: A Subsequence Matching Algroithm Supporting Moving Average Transforms of Arbitrary Order in Time-Series Databases , 2001 .

[4]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[5]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[6]  Man Hon Wong,et al.  Fast time-series searching with scaling and shifting , 1999, PODS '99.

[7]  Dimitrios Gunopulos,et al.  Finding Similar Time Series , 1997, PKDD.

[8]  Haixun Wang,et al.  Landmarks: a new model for similarity-based pattern querying in time series databases , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[9]  D. N. Sparks,et al.  Time Series; Multivariate Analysis , 1977 .

[10]  Michael Ian Shamos,et al.  Computational geometry: an introduction , 1985 .

[11]  Kyuseok Shim,et al.  Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases , 1995, VLDB.

[12]  Alberto O. Mendelzon,et al.  Similarity-based queries , 1995, PODS '95.

[13]  Sriram Padmanabhan,et al.  Prefix-querying: an approach for effective subsequence matching under time warping in sequence databases , 2001, CIKM '01.

[14]  Dina Q. Goldin,et al.  On Similarity Queries for Time-Series Data: Constraint Specification and Implementation , 1995, CP.

[15]  Philip S. Yu,et al.  Data Mining: An Overview from a Database Perspective , 1996, IEEE Trans. Knowl. Data Eng..

[16]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[17]  Kyuseok Shim,et al.  High-dimensional similarity joins , 1997, Proceedings 13th International Conference on Data Engineering.

[18]  Giuseppe Psaila,et al.  Querying Shapes of Histories , 1995, VLDB.

[19]  Wesley W. Chu,et al.  Efficient searches for similar subsequences of different lengths in sequence databases , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[20]  Christos Faloutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[21]  Graham A. Stephen String Searching Algorithms , 1994, Lecture Notes Series on Computing.

[22]  Christos Faloutsos,et al.  Efficient retrieval of similar time sequences under time warping , 1998, Proceedings 14th International Conference on Data Engineering.

[23]  Chris Chatfield,et al.  The Analysis of Time Series: An Introduction , 1981 .

[24]  Donald J. Berndt,et al.  Finding Patterns in Time Series: A Dynamic Programming Approach , 1996, Advances in Knowledge Discovery and Data Mining.