Cluster time series based on partial information

Cluster analysis has been studied extensively for many years. Recently, there has been a lot of interest in using cluster techniques in mining time series data. The previous cluster models mainly focus on the overall and the most remarkable series behaviors. We address the problem: using a portion of information in clustering time series. We describe a model for retrieving and representing the partial information in time series data. By using this model, a methodology for partial-information-based clustering is proposed. We evaluate our approach through comparing the results with a standard classification. The results show our approach could outperform the previous clustering method that is based on the whole information of time series.

[1]  Kohji Fukunaga,et al.  Introduction to Statistical Pattern Recognition-Second Edition , 1990 .

[2]  David A. Hull Improving text retrieval for the routing problem using latent semantic indexing , 1994, SIGIR '94.

[3]  Alberto O. Mendelzon,et al.  Similarity-based queries , 1995, PODS '95.

[4]  Shin'ichi Satoh,et al.  The SR-tree: an index structure for high-dimensional nearest neighbor queries , 1997, SIGMOD '97.

[5]  Alberto O. Mendelzon,et al.  Similarity-based queries for time series data , 1997, SIGMOD '97.

[6]  Eamonn J. Keogh,et al.  An Enhanced Representation of Time Series Which Allows Fast and Accurate Classification, Clustering and Relevance Feedback , 1998, KDD.

[7]  Michael K. Ng,et al.  Data-mining massive time series astronomical data: challenges, problems and solutions , 1999, Inf. Softw. Technol..

[8]  Chinatsu Aone,et al.  Fast and effective text mining using linear-time document clustering , 1999, KDD '99.

[9]  Ada Wai-Chee Fu,et al.  Efficient time series matching by wavelets , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[10]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[11]  Piotr Indyk,et al.  Mining the stock market (extended abstract): which measure is best? , 2000, KDD '00.

[12]  Xiaoming Jin,et al.  Micro Similarity Queries in Time Series Database , 2001, PAKDD.

[13]  Konstantinos Kalpakis,et al.  Distance measures for effective clustering of ARIMA time-series , 2001, Proceedings 2001 IEEE International Conference on Data Mining.