Salient Subsequence Learning for Time Series Clustering

Time series has been a popular research topic over the past decade. Salient subsequences of time series that can benefit the learning task, e.g., classification or clustering, are called shapelets. Shapelet-based time series learning extracts these types of salient subsequences with highly informative features from a time series. Most existing methods for shapelet discovery must scan a large pool of candidate subsequences, which is a time-consuming process. A recent work, [1] , uses regression learning to discover shapelets in a time series; however, it only considers learning shapelets from labeled time series data. This paper proposes an Unsupervised Salient Subsequence Learning (USSL) model that discovers shapelets without the effort of labeling. We developed this new learning function by integrating the strengths of shapelet learning, shapelet regularization, spectral analysis and pseudo-label to simultaneously and automatically learn shapelets to help clustering unlabeled time series better. The optimization model is iteratively solved via a coordinate descent algorithm. Experiments show that our USSL can learn meaningful shapelets, with promising results on real-world and synthetic data that surpass current state-of-the-art unsupervised time series learning methods.

[1]  Carlos Agón,et al.  Time-series data mining , 2012, CSUR.

[2]  Michalis Vazirgiannis,et al.  c ○ 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. On Clustering Validation Techniques , 2022 .

[3]  Fuzhen Zhuang,et al.  Fast Time Series Classification Based on Infrequent Shapelets , 2012, 2012 11th International Conference on Machine Learning and Applications.

[4]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[5]  Eamonn J. Keogh,et al.  Accelerating the discovery of unsupervised-shapelets , 2015, Data Mining and Knowledge Discovery.

[6]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[7]  Eamonn J. Keogh,et al.  Finding Unusual Medical Time-Series Subsequences: Algorithms and Applications , 2006, IEEE Transactions on Information Technology in Biomedicine.

[8]  Vishal Monga,et al.  Robust Extrema Features for Time-Series Data Analysis , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Jinhui Tang,et al.  Unsupervised Feature Selection via Nonnegative Spectral Analysis and Redundancy Control , 2015, IEEE Transactions on Image Processing.

[10]  Eamonn J. Keogh,et al.  Clustering Time Series Using Unsupervised-Shapelets , 2012, 2012 IEEE 12th International Conference on Data Mining.

[11]  Chengqi Zhang,et al.  Unsupervised Feature Learning from Time Series , 2016, IJCAI.

[12]  Shie Mannor,et al.  Time Series Analysis Using Geometric Template Matching , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Jing Liu,et al.  Clustering-Guided Sparse Structural Learning for Unsupervised Feature Selection , 2014, IEEE Transactions on Knowledge and Data Engineering.

[14]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[15]  Dan Roth,et al.  Efficient Pattern-Based Time Series Classification on GPU , 2012, 2012 IEEE 12th International Conference on Data Mining.

[16]  Lei Shi,et al.  Robust Spectral Learning for Unsupervised Feature Selection , 2014, 2014 IEEE International Conference on Data Mining.

[17]  Jason Lines,et al.  Classification of time series by shapelet transformation , 2013, Data Mining and Knowledge Discovery.

[18]  ChengXiang Zhai,et al.  Robust Unsupervised Feature Selection , 2013, IJCAI.

[19]  Eamonn J. Keogh,et al.  Logical-shapelets: an expressive primitive for time series classification , 2011, KDD.

[20]  Lars Schmidt-Thieme,et al.  Learning time-series shapelets , 2014, KDD.

[21]  Carl Tim Kelley,et al.  Iterative methods for optimization , 1999, Frontiers in applied mathematics.

[22]  Raymond T. Ng,et al.  Indexing spatio-temporal trajectories with Chebyshev polynomials , 2004, SIGMOD '04.

[23]  Shusaku Tsumoto,et al.  Cluster Analysis of Time-Series Medical Data Based on the Trajectory Representation and Multiscale Comparison Techniques , 2006, Sixth International Conference on Data Mining (ICDM'06).

[24]  Lei Wang,et al.  Efficient Spectral Feature Selection with Minimum Redundancy , 2010, AAAI.

[25]  Huan Liu,et al.  An Unsupervised Feature Selection Framework for Social Media Data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[26]  Mihai Datcu,et al.  Modeling trajectory of dynamic clusters in image time-series for spatio-temporal reasoning , 2005, IEEE Transactions on Geoscience and Remote Sensing.

[27]  Gérard G. Medioni,et al.  Structured Time Series Analysis for Human Action Segmentation and Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Ying Wah Teh,et al.  Time-series clustering - A decade review , 2015, Inf. Syst..

[29]  Jing Liu,et al.  Robust Structured Subspace Learning for Data Representation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Jason Lines,et al.  Time-Series Classification with COTE: The Collective of Transformation-Based Ensembles , 2015, IEEE Transactions on Knowledge and Data Engineering.

[31]  Eamonn J. Keogh,et al.  Time series shapelets: a novel technique that allows accurate, interpretable and fast classification , 2010, Data Mining and Knowledge Discovery.

[32]  Eamonn J. Keogh,et al.  Time series shapelets: a new primitive for data mining , 2009, KDD.

[33]  Eamonn J. Keogh,et al.  Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping , 2012, KDD.

[34]  A. Hoffman,et al.  Lower bounds for the partitioning of graphs , 1973 .

[35]  Eamonn J. Keogh,et al.  Fast Shapelets: A Scalable Algorithm for Discovering Time Series Shapelets , 2013, SDM.

[36]  Arlindo L. Oliveira,et al.  Identification of Regulatory Modules in Time Series Gene Expression Data Using a Linear Time Biclustering Algorithm , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[37]  Yoshua Bengio,et al.  Scaling Up Spike-and-Slab Models for Unsupervised Feature Learning , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Hao Wang,et al.  Online Feature Selection with Streaming Features , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Aristides Gionis,et al.  Correlating financial time series with micro-blogging activity , 2012, WSDM '12.

[40]  Thomas Brox,et al.  Discriminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Zi Huang,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence ℓ2,1-Norm Regularized Discriminative Feature Selection for Unsupervised Learning , 2022 .

[42]  Eamonn J. Keogh,et al.  Towards never-ending learning from time series streams , 2013, KDD.

[43]  Yang Zhang,et al.  Unsupervised Feature Extraction for Time Series Clustering Using Orthogonal Wavelet Transform , 2006, Informatica.

[44]  Luis Gravano,et al.  k-Shape: Efficient and Accurate Clustering of Time Series , 2015, SIGMOD Conference.

[45]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[46]  George C. Runger,et al.  A Bag-of-Features Framework to Classify Time Series , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Lars Schmidt-Thieme,et al.  Ultra-Fast Shapelets for Time Series Classification , 2015, ArXiv.

[48]  Eamonn J. Keogh,et al.  Scalable Clustering of Time Series with U-Shapelets , 2015, SDM.

[49]  Petia Radeva,et al.  Meta-Parameter Free Unsupervised Sparse Feature Learning , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Stephen J. Taylor,et al.  Modelling Financial Time Series (Second Edition) , 2007 .

[51]  Deng Cai,et al.  Unsupervised feature selection for multi-cluster data , 2010, KDD.

[52]  Huan Liu,et al.  Spectral feature selection for supervised and unsupervised learning , 2007, ICML '07.

[53]  Jure Leskovec,et al.  Patterns of temporal variation in online media , 2011, WSDM '11.

[54]  Vince D. Calhoun,et al.  Shapelet Ensemble for Multi-dimensional Time Series , 2015, SDM.

[55]  Fernando De la Torre,et al.  Generalized Canonical Time Warping , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[56]  Man Lung Yiu,et al.  Efficient discovery of longest-lasting correlation in sequence databases , 2016, The VLDB Journal.

[57]  Jason Lines,et al.  A shapelet transform for time series classification , 2012, KDD.

[58]  Eamonn J. Keogh,et al.  Semi-Supervision Dramatically Improves Time Series Clustering under Dynamic Time Warping , 2016, CIKM.

[59]  Pierre Gançarski,et al.  A global averaging method for dynamic time warping, with applications to clustering , 2011, Pattern Recognit..

[60]  Jing Liu,et al.  Unsupervised Feature Selection Using Nonnegative Spectral Analysis , 2012, AAAI.