Scalable Clustering of Time Series with U-Shapelets

A recently introduced primitive for time series data mining, unsupervised shapelets (u-shapelets), has demonstrated significant potential for time series clustering. In contrast to approaches that consider the entire time series to compute pairwise similarities, the u-shapelets technique allows considering only relevant subsequences of time series. Moreover, u-shapelets allow us to bypass the apparent chicken-and-egg paradox of defining relevant with reference to the clustering itself. U-shapelets have several advantages over rival methods. First, they are defined even when the time series are of different lengths; for example, they allow clustering datasets containing a mixture of single heartbeats and multi-beat ECG recordings. Second, u-shapelets mitigate sensitivity to irrelevant data such as noise, spikes, dropouts, etc. Finally, u-shapelets demonstrated ability to provide additional insights into the data. Unfortunately, the state-ofthe-art algorithms for u-shapelets search are intractable and so their advantages have only been demonstrated on tiny datasets. We propose a simple approach to speed up a ushapelet discovery by two orders of magnitude, without any significant loss in clustering quality.

[1]  Jeremy Buhler,et al.  Finding Motifs Using Random Projections , 2002, J. Comput. Biol..

[2]  David Landsman,et al.  Alignments anchored on genomic landmarks can aid in the identification of regulatory elements , 2005, ISMB.

[3]  Fred Popowich,et al.  AMPds: A public dataset for load disaggregation and eco-feedback research , 2013, 2013 IEEE Electrical Power & Energy Conference.

[4]  Vipin Kumar,et al.  Discovering Groups of Time Series with Similar Behavior in Multiple Small Intervals of Time , 2014, SDM.

[5]  Zhen Wang,et al.  uWave: Accelerometer-based Personalized Gesture Recognition and Its Applications , 2009, PerCom.

[6]  Jens Timmer,et al.  Characteristics of hand tremor time series , 1993, Biological Cybernetics.

[7]  Jeremy Buhler,et al.  Finding motifs using random projections , 2001, RECOMB.

[8]  Konstantinos Kalpakis,et al.  Distance measures for effective clustering of ARIMA time-series , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[9]  Eamonn J. Keogh,et al.  Fast Shapelets: A Scalable Algorithm for Discovering Time Series Shapelets , 2013, SDM.

[10]  Eamonn J. Keogh,et al.  Experimental comparison of representation methods and distance measures for time series data , 2010, Data Mining and Knowledge Discovery.

[11]  Murray G. Efford,et al.  Bird population density estimated from acoustic signals , 2009 .

[12]  Philip Chan,et al.  Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[13]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[14]  Eamonn J. Keogh,et al.  Time Series Classification under More Realistic Assumptions , 2013, SDM.

[15]  Lei Li,et al.  Time Series Clustering: Complex is Simpler! , 2011, ICML.

[16]  Rolf Niedermeier,et al.  On Exact and Approximation Algorithms for Distinguishing Substring Selection , 2003, FCT.

[17]  Jeffrey M. Hausdorff,et al.  Physionet: Components of a New Research Resource for Complex Physiologic Signals". Circu-lation Vol , 2000 .

[18]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[19]  Amy McGovern,et al.  Identifying predictive multi-dimensional time series motifs: an application to severe weather prediction , 2010, Data Mining and Knowledge Discovery.

[20]  Eamonn J. Keogh,et al.  Clustering Time Series Using Unsupervised-Shapelets , 2012, 2012 IEEE 12th International Conference on Data Mining.

[21]  Eamonn J. Keogh,et al.  Logical-shapelets: an expressive primitive for time series classification , 2011, KDD.

[22]  Didier Stricker,et al.  Exploring and extending the boundaries of physical activity recognition , 2011, 2011 IEEE International Conference on Systems, Man, and Cybernetics.

[23]  Dah-Jye Lee,et al.  Anytime Classification Using the Nearest Neighbor Algorithm with Applications to Stream Mining , 2006, Sixth International Conference on Data Mining (ICDM'06).

[24]  Eamonn J. Keogh,et al.  A Complexity-Invariant Distance Measure for Time Series , 2011, SDM.