Clustering Time Series Using Unsupervised-Shapelets

Time series clustering has become an increasingly important research topic over the past decade. Most existing methods for time series clustering rely on distances calculated from the entire raw data using the Euclidean distance or Dynamic Time Warping distance as the distance measure. However, the presence of significant noise, dropouts, or extraneous data can greatly limit the accuracy of clustering in this domain. Moreover, for most real world problems, we cannot expect objects from the same class to be equal in length. As a consequence, most work on time series clustering only considers the clustering of individual time series "behaviors," e.g., individual heart beats or individual gait cycles, and contrives the time series in some way to make them all equal in length. However, contriving the data in such a way is often a harder problem than the clustering itself. In this work, we show that by using only some local patterns and deliberately ignoring the rest of the data, we can mitigate the above problems and cluster time series of different lengths, i.e., cluster one heartbeat with multiple heartbeats. To achieve this we exploit and extend a recently introduced concept in time series data mining called shapelets. Unlike existing work, our work demonstrates for the first time the unintuitive fact that shapelets can be learned from unlabeled time series. We show, with extensive empirical evaluation in diverse domains, that our method is more accurate than existing methods. Moreover, in addition to accurate clustering results, we show that our work also has the potential to give insights into the domains to which it is applied.

[1]  Kweku-Muata Osei-Bryson,et al.  Post-pruning in decision tree induction using multiple performance measures , 2007, Comput. Oper. Res..

[2]  Jie Liu,et al.  Fast approximate correlation for massive time-series data , 2010, SIGMOD Conference.

[3]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[4]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[5]  Michalis Vazirgiannis,et al.  c ○ 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. On Clustering Validation Techniques , 2022 .

[6]  Konstantinos Kalpakis,et al.  Distance measures for effective clustering of ARIMA time-series , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[7]  Yuan Li,et al.  Rotation-invariant similarity in time series using bag-of-patterns representation , 2012, Journal of Intelligent Information Systems.

[8]  Peter Reinartz,et al.  Compression-based unsupervised clustering of spectral signatures , 2011, 2011 3rd Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS).

[9]  Hui Ding,et al.  Querying and mining of time series data: experimental comparison of representations and distance measures , 2008, Proc. VLDB Endow..

[10]  Shusaku Tsumoto,et al.  Cluster Analysis of Time-Series Medical Data Based on the Trajectory Representation and Multiscale Comparison Techniques , 2006, Sixth International Conference on Data Mining (ICDM'06).

[11]  Eamonn J. Keogh,et al.  Clustering of time-series subsequences is meaningless: implications for previous and future research , 2004, Knowledge and Information Systems.

[12]  Dragomir Anguelov,et al.  Mining The Stock Market : Which Measure Is Best ? , 2000 .

[13]  Eamonn J. Keogh,et al.  Time Series Epenthesis: Clustering Time Series Streams Requires Ignoring Some Data , 2011, 2011 IEEE 11th International Conference on Data Mining.

[14]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[15]  Vladimir Pavlovic,et al.  Isotonic CCA for sequence alignment and activity recognition , 2011, 2011 International Conference on Computer Vision.

[16]  F. Mörchen Time series feature extraction for data mining using DWT and DFT , 2003 .

[17]  Jason Lines,et al.  Classification of Household Devices by Electricity Usage Profiles , 2011, IDEAL.

[18]  Philip S. Yu,et al.  Extracting Interpretable Features for Early Classification on Time Series , 2011, SDM.

[19]  Piotr Indyk,et al.  Mining the stock market (extended abstract): which measure is best? , 2000, KDD '00.

[20]  Eamonn J. Keogh,et al.  On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration , 2002, Data Mining and Knowledge Discovery.

[21]  Eamonn J. Keogh,et al.  Time series shapelets: a new primitive for data mining , 2009, KDD.

[22]  Aristides Gionis,et al.  Correlating financial time series with micro-blogging activity , 2012, WSDM '12.

[23]  Jason Lines,et al.  A shapelet transform for time series classification , 2012, KDD.

[24]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[25]  Norbert Link,et al.  Prototype Optimization for Temporarily and Spatially Distorted Time Series , 2010, AAAI Spring Symposium: It's All in the Timing.

[26]  Eamonn J. Keogh,et al.  Logical-shapelets: an expressive primitive for time series classification , 2011, KDD.

[27]  Eamonn J. Keogh,et al.  Making Time-Series Classification More Accurate Using Learned Constraints , 2004, SDM.

[28]  Mi Zhang,et al.  Motion primitive-based human activity recognition using a bag-of-features approach , 2012, IHI '12.

[29]  Yang Zhang,et al.  Unsupervised Feature Extraction for Time Series Clustering Using Orthogonal Wavelet Transform , 2006, Informatica.

[30]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[31]  Xiaozhe Wang,et al.  Characteristic-Based Clustering for Time Series Data , 2006, Data Mining and Knowledge Discovery.

[32]  Hendrik Blockeel,et al.  Web mining research: a survey , 2000, SKDD.