A Novel Similarity Measurement and Clustering Framework for Time Series Based on Convolution Neural Networks

In recent years, with the development of machine learning, especially after the rise of deep learning, time series clustering has been proven to effectively provide useful information in cloud computing and big data. However, many modern clustering algorithms are difficult to mine the complex features of time series, which is important for further analysis. Convolutional neural network provides powerful feature extraction capabilities and has excellent performance in classification tasks, but it is hard to be applied to clustering. Therefore, a similarity measurement method based on convolutional neural networks is proposed. This algorithm converts the number of output changes of the convolutional neural network in the same direction into the similarity of time series, so that the convolutional neural network can mine unlabeled data features in the clustering process. Especially by preferentially collecting a small amount of high similarity data to create labels, a classification algorithm based on the convolutional neural network can be used to assist clustering. The effectiveness of the proposed algorithm is proved by extensive experiments on the UCR time series datasets, and the experimental results show that its superior performance than other leading methods. Compared with other clustering algorithms based on deep networks, the proposed algorithm can output intermediate variables, and visually explain the principle of the algorithm. The application of financial stock linkage analysis provides an auxiliary mechanism for investment decision-making.

[1]  Raúl Santos-Rodríguez,et al.  N2D: (Not Too) Deep Clustering via Clustering the Local Manifold of an Autoencoded Embedding , 2019, 2020 25th International Conference on Pattern Recognition (ICPR).

[2]  Lei Chen,et al.  On The Marriage of Lp-norms and Edit Distance , 2004, VLDB.

[3]  Gunnar Rätsch,et al.  SOM-VAE: Interpretable Discrete Representation Learning on Time Series , 2018, ICLR 2018.

[4]  Homa Karimabadi,et al.  Deep Temporal Clustering : Fully Unsupervised Learning of Time-Domain Features , 2018, ArXiv.

[5]  Witold Pedrycz,et al.  Fuzzy clustering of time series data using dynamic time warping distance , 2015, Eng. Appl. Artif. Intell..

[6]  Kitsuchart Pasupa,et al.  Recurrent Kernel Extreme Reservoir Machine for Time Series Prediction , 2018, IEEE Access.

[7]  Adriano Lorena Inácio de Oliveira,et al.  Expert Systems With Applications , 2022 .

[8]  Qiang Fu,et al.  YADING: Fast Clustering of Large-Scale Time Series Data , 2015, Proc. VLDB Endow..

[9]  Witold Pedrycz,et al.  An area-based shape distance measure of time series , 2016, Appl. Soft Comput..

[10]  Ali Farhadi,et al.  Unsupervised Deep Embedding for Clustering Analysis , 2015, ICML.

[11]  Xiao Zhong,et al.  A comprehensive cluster and classification mining procedure for daily stock market return forecasting , 2017, Neurocomputing.

[12]  Yang Zhang,et al.  Unsupervised Feature Extraction for Time Series Clustering Using Orthogonal Wavelet Transform , 2006, Informatica.

[13]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[14]  Houshang Darabi,et al.  LSTM Fully Convolutional Networks for Time Series Classification , 2017, IEEE Access.

[15]  Fuyuan Xiao,et al.  Time Series Forecasting Based on Complex Network Analysis , 2019, IEEE Access.

[16]  Eamonn J. Keogh,et al.  Fast Shapelets: A Scalable Algorithm for Discovering Time Series Shapelets , 2013, SDM.

[17]  Marwan Bikdash,et al.  Trend Analysis of Fragmented Time Series for mHealth Apps: Hypothesis Testing Based Adaptive Spline Filtering Method With Importance Weighting , 2017, IEEE Access.

[18]  Ying Wah Teh,et al.  Stock market co-movement assessment using a three-phase clustering method , 2014, Expert Syst. Appl..

[19]  Minqiang Xu,et al.  A New Time Series Similarity Measurement Method Based on the Morphological Pattern and Symbolic Aggregate Approximation , 2019, IEEE Access.

[20]  Xuewen Xia,et al.  A Fast Semi-Supervised Clustering Framework for Large-Scale Time Series Data , 2019 .

[21]  Chonghui Guo,et al.  Time Series Clustering Based on ICA for Stock Data Analysis , 2008, 2008 4th International Conference on Wireless Communications, Networking and Mobile Computing.

[22]  Simone Diniz Junqueira Barbosa,et al.  Visual interactive support for selecting scenarios from time-series ensembles , 2018, Decis. Support Syst..

[23]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[24]  Pablo Montero,et al.  TSclust: An R Package for Time Series Clustering , 2014 .

[25]  Houshang Darabi,et al.  Multivariate LSTM-FCNs for Time Series Classification , 2018, Neural Networks.

[26]  Chengqi Zhang,et al.  Salient Subsequence Learning for Time Series Clustering , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Eamonn J. Keogh,et al.  Experimental comparison of representation methods and distance measures for time series data , 2010, Data Mining and Knowledge Discovery.

[28]  Lei Chen,et al.  Robust and fast similarity search for moving object trajectories , 2005, SIGMOD '05.

[29]  Philip Chan,et al.  Toward accurate dynamic time warping in linear time and space , 2007, Intell. Data Anal..

[30]  Tim Oates,et al.  Time series classification from scratch with deep neural networks: A strong baseline , 2016, 2017 International Joint Conference on Neural Networks (IJCNN).

[31]  Alexander Mendiburu,et al.  Similarity Measure Selection for Clustering Time Series Databases , 2016, IEEE Transactions on Knowledge and Data Engineering.

[32]  Qianli Ma,et al.  Learning Representations for Time Series Clustering , 2019, NeurIPS.

[33]  Luis Gravano,et al.  k-Shape: Efficient and Accurate Clustering of Time Series , 2016, SGMD.

[34]  Wenlong Feng,et al.  A Hybrid Algorithm for Forecasting Financial Time Series Data Based on DBSCAN and SVR , 2019, Inf..

[35]  Yang Hong,et al.  A GAN-Based Anomaly Detection Approach for Imbalanced Industrial Time Series , 2019, IEEE Access.

[36]  Jianzhong Li,et al.  Set-based Similarity Search for Time Series , 2016, SIGMOD Conference.

[37]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[38]  Luis Gravano,et al.  Fast and Accurate Time-Series Clustering , 2017, ACM Trans. Database Syst..

[39]  Dimitrios Gunopulos,et al.  Discovering similar multidimensional trajectories , 2002, Proceedings 18th International Conference on Data Engineering.

[40]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[41]  Jianping Yin,et al.  Improved Deep Embedded Clustering with Local Structure Preservation , 2017, IJCAI.

[42]  Eamonn Keogh Exact Indexing of Dynamic Time Warping , 2002, VLDB.

[43]  Baochang Zhang,et al.  Hierarchical residual stochastic networks for time series recognition , 2019, Inf. Sci..

[44]  Wenjian Wang,et al.  A novel distance measure for time series: Maximum shifting correlation distance , 2019, Pattern Recognit. Lett..