Time series clustering via community detection in networks

In this paper, we propose a technique for time series clustering using community detection in complex networks. Firstly, we present a method to transform a set of time series into a network using different distance functions, where each time series is represented by a vertex and the most similar ones are connected. Then, we apply community detection algorithms to identify groups of strongly connected vertices (called a community) and, consequently, identify time series clusters. Still in this paper, we make a comprehensive analysis on the influence of various combinations of time series distance functions, network generation methods and community detection techniques on clustering results. Experimental study shows that the proposed network-based approach achieves better results than various classic or up-to-date clustering techniques under consideration. Statistical tests confirm that the proposed method outperforms some classic clustering algorithms, such as k-medoids, diana, median-linkage and centroid-linkage in various data sets. Interestingly, the proposed method can effectively detect shape patterns presented in time series due to the topological structure of the underlying network constructed in the clustering process. At the same time, other techniques fail to identify such patterns. Moreover, the proposed method is robust enough to group time series presenting similar pattern but with time shifts and/or amplitude variations. In summary, the main point of the proposed method is the transformation of time series from time-space domain to topological domain. Therefore, we hope that our approach contributes not only for time series clustering, but also for general time series analysis tasks.

[1]  P. Boesiger,et al.  A new correlation‐based fuzzy logic clustering algorithm for FMRI , 1998, Magnetic resonance in medicine.

[2]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[3]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[4]  Martin Rosvall,et al.  Maps of random walks on complex networks reveal community structure , 2007, Proceedings of the National Academy of Sciences.

[5]  Pierre Geurts,et al.  Contributions to decision tree induction: bias/variance tradeoff and time series classification , 2002 .

[6]  Yousef Saad,et al.  Fast Approximate kNN Graph Construction for High Dimensional Data via Recursive Lanczos Bisection , 2009, J. Mach. Learn. Res..

[7]  Elizabeth Ann Maharaj,et al.  Cluster of Time Series , 2000, J. Classif..

[8]  Dimitrios Gunopulos,et al.  Discovering similar multidimensional trajectories , 2002, Proceedings 18th International Conference on Data Engineering.

[9]  David Casado,et al.  Classification techniques for time series and functional data , 2010 .

[10]  Yannis Theodoridis,et al.  Index-based Most Similar Trajectory Search , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[11]  Dit-Yan Yeung,et al.  Time series clustering with ARMA mixtures , 2004, Pattern Recognit..

[12]  Matthieu Latapy,et al.  Computing Communities in Large Networks Using Random Walks , 2004, J. Graph Algorithms Appl..

[13]  Alexander Mendiburu,et al.  Distance Measures for Time Series in R: The TSdist Package , 2016, R J..

[14]  Yang Zhang,et al.  Unsupervised Feature Extraction for Time Series Clustering Using Orthogonal Wavelet Transform , 2006, Informatica.

[15]  Liang Zhao,et al.  Stochastic Competitive Learning in Complex Networks , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[16]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[17]  Pablo Montero,et al.  TSclust: An R Package for Time Series Clustering , 2014 .

[18]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[20]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[21]  Eamonn J. Keogh,et al.  Clustering Time Series Using Unsupervised-Shapelets , 2012, 2012 IEEE 12th International Conference on Data Mining.

[22]  Nuno Constantino Castro,et al.  Time Series Data Mining , 2009, Encyclopedia of Database Systems.

[23]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[24]  Eamonn J. Keogh,et al.  CID: an efficient complexity-invariant distance for time series , 2013, Data Mining and Knowledge Discovery.

[25]  Chonghui Guo,et al.  Time Series Clustering Based on ICA for Stock Data Analysis , 2008, 2008 4th International Conference on Wireless Communications, Networking and Mobile Computing.

[26]  Frank Klawonn,et al.  Fuzzy Clustering of Short Time-Series and Unevenly Distributed Sampling Points , 2003, IDA.

[27]  Jiaqi Liu,et al.  A novel clustering method on time series data , 2011, Expert Syst. Appl..

[28]  Andreas M. Brandmaier,et al.  Permutation distribution clustering and structural equation model trees , 2011 .

[29]  Dong Zhou,et al.  Translation techniques in cross-language information retrieval , 2012, CSUR.

[30]  Shilpa Chakravartula,et al.  Complex Networks: Structure and Dynamics , 2014 .

[31]  Eamonn J. Keogh,et al.  Towards parameter-free data mining , 2004, KDD.

[32]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[33]  Jianhong Wu,et al.  Data clustering - theory, algorithms, and applications , 2007 .

[34]  Michalis Vazirgiannis,et al.  On Clustering Validation Techniques , 2001, Journal of Intelligent Information Systems.

[35]  Christos Faloutsos,et al.  Fast Time Sequence Indexing for Arbitrary Lp Norms , 2000, VLDB.

[36]  ZhaoLiang,et al.  Time series clustering via community detection in networks , 2016 .

[37]  Eamonn J. Keogh,et al.  Experimental comparison of representation methods and distance measures for time series data , 2010, Data Mining and Knowledge Discovery.

[38]  Dit-Yan Yeung,et al.  Mixtures of ARMA models for model-based time series clustering , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[39]  Padhraic Smyth,et al.  Clustering Sequences with Hidden Markov Models , 1996, NIPS.

[40]  Réka Albert,et al.  Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[41]  Gábor Csárdi,et al.  The igraph software package for complex network research , 2006 .

[42]  Alain Grumbach,et al.  A Kohonen Map for Temporal Sequences , 1996 .