An online algorithm for segmenting time series

In recent years, there has been an explosion of interest in mining time-series databases. As with most computer science problems, representation of the data is the key to efficient and effective solutions. One of the most commonly used representations is piecewise linear approximation. This representation has been used by various researchers to support clustering, classification, indexing and association rule mining of time-series data. A variety of algorithms have been proposed to obtain this representation, with several algorithms having been independently rediscovered several times. In this paper, we undertake the first extensive review and empirical comparison of all proposed techniques. We show that all these algorithms have fatal flaws from a data-mining perspective. We introduce a novel algorithm that we empirically show to be superior to all others in the literature.

[1]  Heikki Mannila,et al.  Rule Discovery from Time Series , 1998, KDD.

[2]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[3]  Martti Juhola,et al.  Syntactic recognition of ECG signals by attributed finite automata , 1995, Pattern Recognit..

[4]  Michel Verhaegen,et al.  ECG Segmentation Using Time-Warping , 1997, IDA.

[5]  Paul S. Heckbert,et al.  Survey of Polygonal Surface Simplification Algorithms , 1997 .

[6]  Changzhou Wang,et al.  Supporting fast search in time series for movement patterns in multiple scales , 1998, CIKM '98.

[7]  Giuseppe Psaila,et al.  Querying Shapes of Histories , 1995, VLDB.

[8]  Heikki Mannila,et al.  Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining , 1997 .

[9]  R. T. Ogden,et al.  Testing change-points with linear trend , 1994 .

[10]  Philip S. Yu,et al.  MALM: a framework for mining sequence database at multiple abstraction levels , 1998, CIKM '98.

[11]  Eamonn J. Keogh,et al.  An Enhanced Representation of Time Series Which Allows Fast and Accurate Classification, Clustering and Relevance Feedback , 1998, KDD.

[12]  Jim Hunter,et al.  Knowledge-Based Event Detection in Complex Time Series Data , 1999, AIMDM.

[13]  Ada Wai-Chee Fu,et al.  Efficient time series matching by wavelets , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[14]  Theodosios Pavlidis,et al.  Waveform Segmentation Through Functional Approximation , 1973, IEEE Transactions on Computers.

[15]  W. Chu,et al.  Fast retrieval of similar subsequences in long sequence databases , 1999, Proceedings 1999 Workshop on Knowledge and Data Engineering Exchange (KDEX'99) (Cat. No.PR00453).

[16]  Gene H. Hostetter,et al.  Scan-Along Polygonal Approximation for Data Compression of Electrocardiograms , 1983, IEEE Transactions on Biomedical Engineering.

[17]  Eamonn J. Keogh,et al.  Relevance feedback retrieval of time series data , 1999, SIGIR '99.

[18]  Kyuseok Shim,et al.  Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases , 1995, VLDB.

[19]  Kuniaki Uehara,et al.  Extraction of Primitive Motion for Human Motion Recognition , 1999, Discovery Science.

[20]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[21]  Eamonn J. Keogh,et al.  A Probabilistic Approach to Fast Pattern Matching in Time Series Databases , 1997, KDD.

[22]  David H. Douglas,et al.  ALGORITHMS FOR THE REDUCTION OF THE NUMBER OF POINTS REQUIRED TO REPRESENT A DIGITIZED LINE OR ITS CARICATURE , 1973 .

[23]  Changzhou Wang,et al.  Supporting content-based searches on time series via approximation , 2000, Proceedings. 12th International Conference on Scientific and Statistica Database Management.

[24]  Padhraic Smyth,et al.  Segmental Semi-Markov Models for Endpoint Detection in Plasma Etching , 2000 .

[25]  Hagit Shatkay,et al.  Approximate queries and representations for large data sequences , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[26]  Urs Ramer,et al.  An iterative procedure for the polygonal approximation of plane curves , 1972, Comput. Graph. Image Process..

[27]  Haixun Wang,et al.  Landmarks: a new model for similarity-based pattern querying in time series databases , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[28]  Eamonn J. Keogh,et al.  Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases , 2001, Knowledge and Information Systems.