Finding time series discord based on bit representation clustering

The problem of finding time series discord has attracted much attention recently due to its numerous applications and several algorithms have been suggested. However, most of them suffer from high computation cost and cannot satisfy the requirement of real applications. In this paper, we propose a novel discord discovery algorithm BitClusterDiscord which is based on bit representation clustering. Firstly, we use PAA (Piecewise Aggregate Approximation) bit serialization to segment time series, so as to capture the main variation characteristic of time series and avoid the influence of noise. Secondly, we present an improved K-Medoids clustering algorithm to merge several patterns with similar variation behaviors into a common cluster. Finally, based on bit representation clustering, we design two pruning strategies and propose an effective algorithm for time series discord discovery. Extensive experiments have demonstrated that the proposed approach can not only effectively find discord of time series, but also greatly improve the computational efficiency.

[1]  Eamonn J. Keogh,et al.  Experimental comparison of representation methods and distance measures for time series data , 2010, Data Mining and Knowledge Discovery.

[2]  George Manis,et al.  Heartbeat Time Series Classification With Support Vector Machines , 2009, IEEE Transactions on Information Technology in Biomedicine.

[3]  Eamonn J. Keogh,et al.  Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases , 2001, Knowledge and Information Systems.

[4]  Eamonn J. Keogh,et al.  Finding Time Series Discords Based on Haar Transform , 2006, ADMA.

[5]  Dominik Fisch,et al.  SwiftRule: Mining Comprehensible Classification Rules for Time Series Analysis , 2011, IEEE Transactions on Knowledge and Data Engineering.

[6]  Li Wei,et al.  SAXually Explicit Images: Finding Unusual Shapes , 2006, Sixth International Conference on Data Mining (ICDM'06).

[7]  Eamonn J. Keogh,et al.  iSAX: indexing and mining terabyte sized time series , 2008, KDD.

[8]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[9]  Eamonn J. Keogh,et al.  Disk aware discord discovery: finding unusual time series in terabyte sized datasets , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[10]  Martti Juhola,et al.  Syntactic recognition of ECG signals by attributed finite automata , 1995, Pattern Recognit..

[11]  Jian Pei,et al.  WAT: Finding Top-K Discords in Time Series Database , 2007, SDM.

[12]  Chonghui Guo,et al.  Piecewise cloud approximation for time series mining , 2011, Knowl. Based Syst..

[13]  Zhaohong Deng,et al.  Weighted spherical 1-mean with phase shift and its application in electrocardiogram discord detection , 2013, Artif. Intell. Medicine.

[14]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[15]  Ge Yu,et al.  FSMBO: Fast Time Series Similarity Matching Based on Bit Operation , 2008, 2008 The 9th International Conference for Young Computer Scientists.

[16]  Gareth J. Janacek,et al.  A Bit Level Representation for Time Series Data Mining with Shape Based Similarity , 2006, Data Mining and Knowledge Discovery.

[17]  Eamonn J. Keogh,et al.  A Novel Bit Level Time Series Representation with Implication of Similarity Search and Clustering , 2005, PAKDD.

[18]  Eamonn J. Keogh,et al.  HOT SAX: efficiently finding the most unusual time series subsequence , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[19]  Zeeshan Syed,et al.  Unsupervised Similarity-Based Risk Stratification for Cardiovascular Events Using Long-Term Time-Series Data , 2011, J. Mach. Learn. Res..

[20]  Padhraic Smyth,et al.  Segmental Hidden Markov Models with Random Effects for Waveform Modeling , 2006, J. Mach. Learn. Res..