论文信息 - Scalable Clustering of Time Series with U-Shapelets

Scalable Clustering of Time Series with U-Shapelets

A recently introduced primitive for time series data mining, unsupervised shapelets (u-shapelets), has demonstrated significant potential for time series clustering. In contrast to approaches that consider the entire time series to compute pairwise similarities, the u-shapelets technique allows considering only relevant subsequences of time series. Moreover, u-shapelets allow us to bypass the apparent chicken-and-egg paradox of defining relevant with reference to the clustering itself. U-shapelets have several advantages over rival methods. First, they are defined even when the time series are of different lengths; for example, they allow clustering datasets containing a mixture of single heartbeats and multi-beat ECG recordings. Second, u-shapelets mitigate sensitivity to irrelevant data such as noise, spikes, dropouts, etc. Finally, u-shapelets demonstrated ability to provide additional insights into the data. Unfortunately, the state-ofthe-art algorithms for u-shapelets search are intractable and so their advantages have only been demonstrated on tiny datasets. We propose a simple approach to speed up a ushapelet discovery by two orders of magnitude, without any significant loss in clustering quality.

Eamonn J. Keogh | Nurjahan Begum | Liudmila Ulanova | Liudmila Ulanova | Nurjahan Begum

[1] Jeremy Buhler,et al. Finding Motifs Using Random Projections , 2002, J. Comput. Biol..

[2] David Landsman,et al. Alignments anchored on genomic landmarks can aid in the identification of regulatory elements , 2005, ISMB.

[3] Fred Popowich,et al. AMPds: A public dataset for load disaggregation and eco-feedback research , 2013, 2013 IEEE Electrical Power & Energy Conference.

[4] Vipin Kumar,et al. Discovering Groups of Time Series with Similar Behavior in Multiple Small Intervals of Time , 2014, SDM.

[5] Zhen Wang,et al. uWave: Accelerometer-based Personalized Gesture Recognition and Its Applications , 2009, PerCom.

[6] Jens Timmer,et al. Characteristics of hand tremor time series , 1993, Biological Cybernetics.

[7] Jeremy Buhler,et al. Finding motifs using random projections , 2001, RECOMB.

[8] Konstantinos Kalpakis,et al. Distance measures for effective clustering of ARIMA time-series , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[9] Eamonn J. Keogh,et al. Fast Shapelets: A Scalable Algorithm for Discovering Time Series Shapelets , 2013, SDM.

[10] Eamonn J. Keogh,et al. Experimental comparison of representation methods and distance measures for time series data , 2010, Data Mining and Knowledge Discovery.

[11] Murray G. Efford,et al. Bird population density estimated from acoustic signals , 2009 .

[12] Philip Chan,et al. Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[13] Eamonn J. Keogh,et al. A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[14] Eamonn J. Keogh,et al. Time Series Classification under More Realistic Assumptions , 2013, SDM.

[15] Lei Li,et al. Time Series Clustering: Complex is Simpler! , 2011, ICML.

[16] Rolf Niedermeier,et al. On Exact and Approximation Algorithms for Distinguishing Substring Selection , 2003, FCT.

[17] Jeffrey M. Hausdorff,et al. Physionet: Components of a New Research Resource for Complex Physiologic Signals". Circu-lation Vol , 2000 .

[18] William M. Rand,et al. Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[19] Amy McGovern,et al. Identifying predictive multi-dimensional time series motifs: an application to severe weather prediction , 2010, Data Mining and Knowledge Discovery.

[20] Eamonn J. Keogh,et al. Clustering Time Series Using Unsupervised-Shapelets , 2012, 2012 IEEE 12th International Conference on Data Mining.

[21] Eamonn J. Keogh,et al. Logical-shapelets: an expressive primitive for time series classification , 2011, KDD.

[22] Didier Stricker,et al. Exploring and extending the boundaries of physical activity recognition , 2011, 2011 IEEE International Conference on Systems, Man, and Cybernetics.

[23] Dah-Jye Lee,et al. Anytime Classification Using the Nearest Neighbor Algorithm with Applications to Stream Mining , 2006, Sixth International Conference on Data Mining (ICDM'06).

[24] Eamonn J. Keogh,et al. A Complexity-Invariant Distance Measure for Time Series , 2011, SDM.