Using Multi-Scale Histograms to Answer Pattern Existence and Shape Match Queries

Similarity-based querying of time series data can be categorized as pattern existence queries and shape match queries. Pattern existence queries find the time series data with certain patterns while shape match queries look for the time series data that have similar movement shapes. Existing proposals address one of these or the other. In this paper, we propose multi-scale time series histograms that can be used to answer both types of queries, thus offering users more flexibility. Multiple histogram levels allow querying at various precision levels. Most importantly, the distanc es of time series histograms at lower scale are lower bounds of the distances at higher scale, which guarantees that no false dismissals will be introduced when a multi-step filter ing process is used in answering shape match queries. We further propose to use averages of time series histograms to reduce the dimensionality and avoid computing the distances of full time series histograms. The experimental results show that multi-scale histograms can effectively find the patterns in time series data and answer shape match queries, even when the data contain noise, time shifting and scaling, or amplitude shifting and scaling.

[1]  Donghui Zhang,et al.  Online event-driven subsequence matching over financial data streams , 2004, SIGMOD '04.

[2]  Changzhou Wang,et al.  Supporting Movement Pattern Queries in User-Specified Scales , 2003, IEEE Trans. Knowl. Data Eng..

[3]  James Lee Hafner,et al.  Efficient Color Histogram Indexing for Quadratic Form Distance Functions , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Dimitrios Gunopulos,et al.  Discovering similar multidimensional trajectories , 2002, Proceedings 18th International Conference on Data Engineering.

[5]  Ada Wai-Chee Fu,et al.  Efficient time series matching by wavelets , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[6]  Z. Meral Özsoyoglu,et al.  Distance-based indexing for high-dimensional metric spaces , 1997, SIGMOD '97.

[7]  Raymond T. Ng,et al.  Multiscale Similarity Matching for Subimage Queries of Arbitrary Size , 1998, VDB.

[8]  Markus A. Stricker,et al.  Similarity of color images , 1995, Electronic Imaging.

[9]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[10]  Kyuseok Shim,et al.  WALRUS: A Similarity Retrieval Algorithm for Image Databases , 2004, IEEE Trans. Knowl. Data Eng..

[11]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[12]  Christos Faloutsos,et al.  Efficient retrieval of similar time sequences under time warping , 1998, Proceedings 14th International Conference on Data Engineering.

[13]  Lei Chen,et al.  On the Marriage of Edit Distance and Lp Norms , 2004, VLDB 2004.

[14]  Wesley W. Chu,et al.  Similarity-Based Subsequence Search in Image Sequence Databases , 2003, Int. J. Image Graph..

[15]  Dina Q. Goldin,et al.  On Similarity Queries for Time-Series Data: Constraint Specification and Implementation , 1995, CP.

[16]  Margrit Betke,et al.  THE CAMERA MOUSE: PRELIMINARY INVESTIGATION OF AUTOMATED VISUAL TRACKING FOR COMPUTER ACCESS , 2000 .

[17]  Donald J. Berndt,et al.  Finding Patterns in Time Series: A Dynamic Programming Approach , 1996, Advances in Knowledge Discovery and Data Mining.

[18]  Eamonn J. Keogh,et al.  On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration , 2002, Data Mining and Knowledge Discovery.

[19]  Giuseppe Psaila,et al.  Querying Shapes of Histories , 1995, VLDB.

[20]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[21]  Dimitrios Gunopulos,et al.  A Wavelet-Based Anytime Algorithm for K-Means Clustering of Time Series , 2003 .

[22]  Qiang Wang,et al.  A symbolic representation of time series , 2005, Proceedings of the Eighth International Symposium on Signal Processing and Its Applications, 2005..

[23]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[24]  Dennis Shasha,et al.  Warping indexes with envelope transforms for query by humming , 2003, SIGMOD '03.

[25]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[26]  Qing Liu,et al.  Multiscale Histograms: Summarizing Topological Relations in Large Spatial Datasets , 2003, VLDB.

[27]  Changzhou Wang,et al.  Supporting content-based searches on time series via approximation , 2000, Proceedings. 12th International Conference on Scientific and Statistica Database Management.

[28]  Hagit Shatkay,et al.  Approximate queries and representations for large data sequences , 1996, Proceedings of the Twelfth International Conference on Data Engineering.