Similarity measure based on partial information of time series

Similarity measure of time series is an important subroutine in many KDD applications. Previous similarity models mainly focus on the prominent series behaviors by considering the whole information of time series. In this paper, we address the problem: which portion of information is more suitable for similarity measure for the data collected from a certain field. We propose a model for the retrieval and representation of the partial information in time series data, and a methodology for evaluating the similarity measurements based on partial information. The methodology is to retrieve various portions of information from the raw data and represent it in a concise form, then cluster the time series using the partial information and evaluate the similarity measurements through comparing the results with a standard classification. Experiments on data set from stock market give some interesting observations and justify the usefulness of our approach.

[1]  Ada Wai-Chee Fu,et al.  Efficient time series matching by wavelets , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[2]  Hagit Shatkay,et al.  Approximate queries and representations for large data sequences , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[3]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[4]  Dragomir Anguelov,et al.  Mining The Stock Market : Which Measure Is Best ? , 2000 .

[5]  Piotr Indyk,et al.  Mining the stock market (extended abstract): which measure is best? , 2000, KDD '00.

[6]  Michael K. Ng,et al.  Data-mining massive time series astronomical data: challenges, problems and solutions , 1999, Inf. Softw. Technol..

[7]  Juan Pedro Caraça-Valente,et al.  Discovering similar patterns in time series , 2000, KDD '00.

[8]  Eamonn J. Keogh,et al.  An Enhanced Representation of Time Series Which Allows Fast and Accurate Classification, Clustering and Relevance Feedback , 1998, KDD.

[9]  Xiaoming Jin,et al.  Micro Similarity Queries in Time Series Database , 2001, PAKDD.

[10]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[11]  David A. Hull Improving text retrieval for the routing problem using latent semantic indexing , 1994, SIGIR '94.

[12]  Kyuseok Shim,et al.  Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases , 1995, VLDB.

[13]  Alberto O. Mendelzon,et al.  Similarity-based queries , 1995, PODS '95.

[14]  Dennis DeCoste Mining Multivariate Time-Series Sensor Data to Discover Behavior Envelopes , 1997, KDD.

[15]  Alberto O. Mendelzon,et al.  Similarity-based queries for time series data , 1997, SIGMOD '97.

[16]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .