Mining for similarities in aligned time series using wavelets

Discovery of non-obvious relationships between time series is an important problem in many domains, such as financial, sensory, and scientific data analysis. We consider data mining in aligned time series, which arise, e.g., in numerous online monitoring applications, and we are interested in finding time series which reflect the same external events. The time series can have different vertical positions, scales and overall trends, however still show related features at the same locations. The features can be short-term, such as small peaks and turns, or long-term, such as wider mountains and valleys. We propose using a wavelet transformation of a time series to produce a natural set of features for the sequence. Wavelet transformation yields features which describe properties of the sequence, both at various locations and at varying time granularities. In the proposed method, these features are processed so that they are insensitive to changes in the vertical position, scaling, and overall trend of the time series. We discuss the use of these features in data mining, in tasks such as clustering. We demonstrate how the features allow a flexible analysis of different aspects of the similarity: we show how one can examine how the similarity between time series changes as a function of time or as a function of time granularity considered. We present experimental results with real financial data sets. Experiments indicate that the proposed method can produce useful results. For instance, important similarities can be found in time series, which would be considered unrelated by visual inspection. Experiments with compression give encouraging results for the application of the method in mining massive time series data sets.

[1]  Amara Lynn Graps,et al.  An introduction to wavelets , 1995 .

[2]  P. A. Blight The Analysis of Time Series: An Introduction , 1991 .

[3]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[4]  Alberto O. Mendelzon,et al.  Similarity-based queries for time series data , 1997, SIGMOD '97.

[5]  Kyuseok Shim,et al.  Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases , 1995, VLDB.

[6]  Sujit K. Ghosh,et al.  Essential Wavelets for Statistical Applications and Data Analysis , 2001, Technometrics.

[7]  Christos Faloutsos,et al.  FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets , 1995, SIGMOD '95.

[8]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[9]  Giuseppe Psaila,et al.  Querying Shapes of Histories , 1995, VLDB.

[10]  Dimitrios Gunopulos,et al.  Finding Similar Time Series , 1997, PKDD.

[11]  Christos Faloutsos,et al.  Efficient retrieval of similar time sequences under time warping , 1998, Proceedings 14th International Conference on Data Engineering.

[12]  C. Faloutsos Eecient Similarity Search in Sequence Databases , 1993 .

[13]  Philip S. Yu,et al.  HierarchyScan: a hierarchical similarity search algorithm for databases of long sequences , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[14]  Dina Q. Goldin,et al.  On Similarity Queries for Time-Series Data: Constraint Specification and Implementation , 1995, CP.

[15]  Donald J. Berndt,et al.  Finding Patterns in Time Series: A Dynamic Programming Approach , 1996, Advances in Knowledge Discovery and Data Mining.

[16]  C. Chui Wavelet Analysis and Its Applications , 1992 .

[17]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[18]  Christos Faloutsos,et al.  Efficiently supporting ad hoc queries in large datasets of time sequences , 1997, SIGMOD '97.

[19]  Eamonn J. Keogh,et al.  A Probabilistic Approach to Fast Pattern Matching in Time Series Databases , 1997, KDD.

[20]  Jorma Laaksonen,et al.  SOM_PAK: The Self-Organizing Map Program Package , 1996 .

[21]  Hagit Shatkay,et al.  Approximate queries and representations for large data sequences , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[22]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[23]  Christos Faloutsos,et al.  A signature technique for similarity-based queries , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[24]  J. Friedman Exploratory Projection Pursuit , 1987 .