Locally adaptive dimensionality reduction for indexing large time series databases

Similarity search in large time series databases has attracted much research interest recently. It is a difficult problem because of the typically high dimensionality of the data.. The most promising solutions involve performing dimensionality reduction on the data, then indexing the reduced data with a multidimensional index structure. Many dimensionality reduction techniques have been proposed, including Singular Value Decomposition (SVD), the Discrete Fourier transform (DFT), and the Discrete Wavelet Transform (DWT). In this work we introduce a new dimensionality reduction technique which we call Adaptive Piecewise Constant Approximation (APCA). While previous techniques (e.g., SVD, DFT and DWT) choose a common representation for all the items in the database that minimizes the global reconstruction error, APCA approximates each time series by a set of constant value segments of varying lengths such that their individual reconstruction errors are minimal. We show how APCA can be indexed using a multidimensional index structure. We propose two distance measures in the indexed space that exploit the high fidelity of APCA for fast searching: a lower bounding Euclidean distance approximation, and a non-lower bounding, but very tight Euclidean distance approximation and show how they can support fast exact searching, and even faster approximate searching on the same index structure. We theoretically and empirically compare APCA to all the other techniques and demonstrate its superiority.

[1]  Eamonn J. Keogh,et al.  Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases , 2001, Knowledge and Information Systems.

[2]  Ambuj K. Singh,et al.  Variable length queries for time series data , 2001, Proceedings 17th International Conference on Data Engineering.

[3]  Sang-Wook Kim,et al.  Index interpolation: an approach to subsequence matching supporting normalization transform in time-series databases , 2000, CIKM '00.

[4]  Divyakant Agrawal,et al.  A comparison of DFT and DWT based similarity search in time-series databases , 2000, CIKM '00.

[5]  Christos Faloutsos,et al.  Fast Time Sequence Indexing for Arbitrary Lp Norms , 2000, VLDB.

[6]  Sharad Mehrotra,et al.  Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces , 2000, VLDB.

[7]  Sharad Mehrotra,et al.  Similar shape retrieval in MARS , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[8]  Changzhou Wang,et al.  Supporting content-based searches on time series via approximation , 2000, Proceedings. 12th International Conference on Scientific and Statistica Database Management.

[9]  Haixun Wang,et al.  Landmarks: a new model for similarity-based pattern querying in time series databases , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[10]  W. Chu,et al.  Fast retrieval of similar subsequences in long sequence databases , 1999, Proceedings 1999 Workshop on Knowledge and Data Engineering Exchange (KDEX'99) (Cat. No.PR00453).

[11]  Kriengkrai Porkaew,et al.  Query refinement for multimedia similarity retrieval in MARS , 1999, MULTIMEDIA '99.

[12]  Zbigniew R. Struzik,et al.  The Haar Wavelet Transform in the Time Series Similarity Paradigm , 1999, PKDD.

[13]  Eamonn J. Keogh,et al.  Relevance feedback retrieval of time series data , 1999, SIGIR '99.

[14]  Kristin P. Bennett,et al.  Density-based indexing for approximate nearest-neighbor queries , 1999, KDD '99.

[15]  Philip S. Yu,et al.  Adaptive query processing for time-series data , 1999, KDD '99.

[16]  Man Hon Wong,et al.  Fast time-series searching with scaling and shifting , 1999, PODS '99.

[17]  Ada Wai-Chee Fu,et al.  Efficient time series matching by wavelets , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[18]  Sharad Mehrotra,et al.  The hybrid tree: an index structure for high dimensional feature spaces , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[19]  Davood Rafiei,et al.  On similarity-based queries for time series data , 1997, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[20]  Man Hon Wong,et al.  A Fast Projection Algorithm for Sequence Data Searching , 1998, Data Knowl. Eng..

[21]  Philip S. Yu,et al.  MALM: a framework for mining sequence database at multiple abstraction levels , 1998, CIKM '98.

[22]  Changzhou Wang,et al.  Supporting fast search in time series for movement patterns in multiple scales , 1998, CIKM '98.

[23]  Georges Hébrail,et al.  Interactive Interpretation of Kohonen Maps Applied to Curves , 1998, KDD.

[24]  Eamonn J. Keogh,et al.  An Enhanced Representation of Time Series Which Allows Fast and Accurate Classification, Clustering and Relevance Feedback , 1998, KDD.

[25]  Heikki Mannila,et al.  Rule Discovery from Time Series , 1998, KDD.

[26]  Ambuj K. Singh,et al.  Dimensionality reduction for similarity searching in dynamic databases , 1998, SIGMOD '98.

[27]  Hans-Peter Kriegel,et al.  Optimal multi-step k-nearest neighbor search , 1998, SIGMOD '98.

[28]  Michael K. Ng,et al.  Data-Mining Massive Time Series Astronomical Data Sets - A Case Study , 1998, PAKDD.

[29]  Christos Faloutsos,et al.  Efficient retrieval of similar time sequences under time warping , 1998, Proceedings 14th International Conference on Data Engineering.

[30]  Eamonn J. Keogh,et al.  A Probabilistic Approach to Fast Pattern Matching in Time Series Databases , 1997, KDD.

[31]  Henrik André-Jönsson,et al.  Using Signature Files for Querying Time-Series Data , 1997, PKDD.

[32]  Christos Faloutsos,et al.  A signature technique for similarity-based queries , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[33]  Christos Faloutsos,et al.  Efficiently supporting ad hoc queries in large datasets of time sequences , 1997, SIGMOD '97.

[34]  Christos H. Papadimitriou,et al.  On the analysis of indexing schemes , 1997, PODS '97.

[35]  Georgios Evangelidis,et al.  The hB $^\Pi$-tree: a multi-attribute index supporting concurrency, recovery and node consolidation , 1997, The VLDB Journal.

[36]  Christos H. Papadimitriou,et al.  Towards an analysis of indexing schemes , 1997, PODS 1997.

[37]  Ambuj K. Singh,et al.  Efficient retrieval for browsing large image databases , 1996, CIKM '96.

[38]  Hagit Shatkay,et al.  Approximate queries and representations for large data sequences , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[39]  B. S. Manjunath,et al.  An eigenspace update algorithm for image analysis , 1995, Proceedings of International Symposium on Computer Vision - ISCV.

[40]  Giuseppe Psaila,et al.  Querying Shapes of Histories , 1995, VLDB.

[41]  Kyuseok Shim,et al.  Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases , 1995, VLDB.

[42]  Hanan Samet,et al.  Ranking in Spatial Databases , 1995, SSD.

[43]  Nick Roussopoulos,et al.  Nearest neighbor queries , 1995, SIGMOD '95.

[44]  David Salesin,et al.  Wavelets for computer graphics: a primer.1 , 1995, IEEE Computer Graphics and Applications.

[45]  Christos Faloutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[46]  E. J. Stollnitz,et al.  Wavelets for Computer Graphics : A Primer , 1994 .

[47]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[48]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[49]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[50]  Theodosios Pavlidis,et al.  Waveform Segmentation Through Functional Approximation , 1973, IEEE Transactions on Computers.

[51]  Eamonn J. Keogh,et al.  Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases , 2001, Knowledge and Information Systems.

[52]  Ambuj K. Singh,et al.  Variable length queries for time series data , 2001, Proceedings 17th International Conference on Data Engineering.

[53]  Divyakant Agrawal,et al.  A comparison of DFT and DWT based similarity search in time-series databases , 2000, CIKM '00.

[54]  Sang-Wook Kim,et al.  Index interpolation: an approach to subsequence matching supporting normalization transform in time-series databases , 2000, CIKM '00.

[55]  Christos Faloutsos,et al.  Fast Time Sequence Indexing for Arbitrary Lp Norms , 2000, VLDB.

[56]  Sharad Mehrotra,et al.  Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces , 2000, VLDB.

[57]  Sharad Mehrotra,et al.  Similar shape retrieval in MARS , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[58]  Changzhou Wang,et al.  Supporting content-based searches on time series via approximation , 2000, Proceedings. 12th International Conference on Scientific and Statistica Database Management.

[59]  Haixun Wang,et al.  Landmarks: a new model for similarity-based pattern querying in time series databases , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[60]  W. Chu,et al.  Fast retrieval of similar subsequences in long sequence databases , 1999, Proceedings 1999 Workshop on Knowledge and Data Engineering Exchange (KDEX'99) (Cat. No.PR00453).

[61]  Kriengkrai Porkaew,et al.  Query refinement for multimedia similarity retrieval in MARS , 1999, MULTIMEDIA '99.

[62]  Zbigniew R. Struzik,et al.  The Haar Wavelet Transform in the Time Series Similarity Paradigm , 1999, PKDD.

[63]  Kristin P. Bennett,et al.  Density-based indexing for approximate nearest-neighbor queries , 1999, KDD '99.

[64]  Eamonn J. Keogh,et al.  Relevance feedback retrieval of time series data , 1999, SIGIR '99.

[65]  Philip S. Yu,et al.  Adaptive query processing for time-series data , 1999, KDD '99.

[66]  Man Hon Wong,et al.  Fast time-series searching with scaling and shifting , 1999, PODS '99.

[67]  Ada Wai-Chee Fu,et al.  Efficient time series matching by wavelets , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[68]  Sharad Mehrotra,et al.  The hybrid tree: an index structure for high dimensional feature spaces , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[69]  Davood Rafiei,et al.  On similarity-based queries for time series data , 1997, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[70]  Man Hon Wong,et al.  A Fast Projection Algorithm for Sequence Data Searching , 1998, Data Knowl. Eng..

[71]  Philip S. Yu,et al.  MALM: a framework for mining sequence database at multiple abstraction levels , 1998, CIKM '98.

[72]  Changzhou Wang,et al.  Supporting fast search in time series for movement patterns in multiple scales , 1998, CIKM '98.

[73]  Georges Hébrail,et al.  Interactive Interpretation of Kohonen Maps Applied to Curves , 1998, KDD.

[74]  Heikki Mannila,et al.  Rule Discovery from Time Series , 1998, KDD.

[75]  Eamonn J. Keogh,et al.  An Enhanced Representation of Time Series Which Allows Fast and Accurate Classification, Clustering and Relevance Feedback , 1998, KDD.

[76]  Ambuj K. Singh,et al.  Dimensionality reduction for similarity searching in dynamic databases , 1998, SIGMOD '98.

[77]  Hans-Peter Kriegel,et al.  Optimal multi-step k-nearest neighbor search , 1998, SIGMOD '98.

[78]  Michael K. Ng,et al.  Data-Mining Massive Time Series Astronomical Data Sets - A Case Study , 1998, PAKDD.

[79]  Christos Faloutsos,et al.  Efficient retrieval of similar time sequences under time warping , 1998, Proceedings 14th International Conference on Data Engineering.

[80]  Eamonn J. Keogh,et al.  A Probabilistic Approach to Fast Pattern Matching in Time Series Databases , 1997, KDD.

[81]  Henrik André-Jönsson,et al.  Using Signature Files for Querying Time-Series Data , 1997, PKDD.

[82]  Christos Faloutsos,et al.  A signature technique for similarity-based queries , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[83]  Christos Faloutsos,et al.  Efficiently supporting ad hoc queries in large datasets of time sequences , 1997, SIGMOD '97.

[84]  Christos H. Papadimitriou,et al.  On the analysis of indexing schemes , 1997, PODS '97.

[85]  Georgios Evangelidis,et al.  The hB $^\Pi$-tree: a multi-attribute index supporting concurrency, recovery and node consolidation , 1997, The VLDB Journal.

[86]  Christos H. Papadimitriou,et al.  Towards an analysis of indexing schemes , 1997, PODS 1997.

[87]  Ambuj K. Singh,et al.  Efficient retrieval for browsing large image databases , 1996, CIKM '96.

[88]  Hagit Shatkay,et al.  Approximate queries and representations for large data sequences , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[89]  David Salesin,et al.  Wavelets for computer graphics: theory and applications , 1996 .

[90]  B. S. Manjunath,et al.  An eigenspace update algorithm for image analysis , 1995, Proceedings of International Symposium on Computer Vision - ISCV.

[91]  Kyuseok Shim,et al.  Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases , 1995, VLDB.

[92]  Giuseppe Psaila,et al.  Querying Shapes of Histories , 1995, VLDB.

[93]  Hanan Samet,et al.  Ranking in Spatial Databases , 1995, SSD.

[94]  Nick Roussopoulos,et al.  Nearest neighbor queries , 1995, SIGMOD '95.

[95]  David Salesin,et al.  Wavelets for computer graphics: a primer.1 , 1995, IEEE Computer Graphics and Applications.

[96]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[97]  E. J. Stollnitz,et al.  Wavelets for Computer Graphics : A Primer , 1994 .

[98]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[99]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[100]  Theodosios Pavlidis,et al.  Waveform Segmentation Through Functional Approximation , 1973, IEEE Transactions on Computers.