相关论文

Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data

Abstract:Subsequence clustering of multivariate time series is a useful tool for discovering repeated patterns in temporal data. Once these patterns have been discovered, seemingly complicated datasets can be interpreted as a temporal sequence of only a small number of states, or clusters. For example, raw sensor data from a fitness-tracking application can be expressed as a timeline of a select few actions (i.e., walking, sitting, running). However, discovering these patterns is challenging because it requires simultaneous segmentation and clustering of the time series. Furthermore, interpreting the resulting clusters is difficult, especially when the data is high-dimensional. Here we propose a new method of model-based clustering, which we call Toeplitz Inverse Covariance-based Clustering (TICC). Each cluster in the TICC method is defined by a correlation network, or Markov random field (MRF), characterizing the interdependencies between different observations in a typical subsequence of that cluster. Based on this graphical representation, TICC simultaneously segments and clusters the time series data. We solve the TICC problem through alternating minimization, using a variation of the expectation maximization (EM) algorithm. We derive closed-form solutions to efficiently solve the two resulting subproblems in a scalable way, through dynamic programming and the alternating direction method of multipliers (ADMM), respectively. We validate our approach by comparing TICC to several state-of-the-art baselines in a series of synthetic experiments, and we then demonstrate on an automobile sensor dataset how TICC can be used to learn interpretable clusters in real-world scenarios.

参考文献

[1]  Pradeep Ravikumar,et al.  QUIC: quadratic approximation for sparse inverse covariance estimation , 2014, J. Mach. Learn. Res..

[2]  Eamonn J. Keogh,et al.  Scaling up dynamic time warping for datamining applications , 2000, KDD '00.

[3]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[4]  Stephen P. Boyd,et al.  Conic Optimization via Operator Splitting and Homogeneous Self-Dual Embedding , 2013, Journal of Optimization Theory and Applications.

[5]  Alexandre d'Aspremont,et al.  Model Selection Through Sparse Max Likelihood Estimation Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data , 2022 .

[6]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[7]  U. Brandes A faster algorithm for betweenness centrality , 2001 .

[8]  Johannes Peltola,et al.  Activity classification using realistic data from wearable sensors , 2006, IEEE Transactions on Information Technology in Biomedicine.

[9]  Martin J. Wainwright,et al.  Log-determinant relaxation for approximate inference in discrete Markov random fields , 2006, IEEE Transactions on Signal Processing.

[10]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[11]  Eamonn J. Keogh,et al.  Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy , 2015, KDD.

[12]  J. Zico Kolter,et al.  Sparse Gaussian Conditional Random Fields: Algorithms, Theory, and Application to Energy Forecasting , 2013, ICML.

[13]  Eamonn J. Keogh,et al.  Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping , 2012, KDD.

[14]  Dit-Yan Yeung,et al.  Time series clustering with ARMA mixtures , 2004, Pattern Recognit..

[15]  Thomas Martinetz,et al.  'Neural-gas' network for vector quantization and its application to time-series prediction , 1993, IEEE Trans. Neural Networks.

[16]  A. H. Shirazi,et al.  Network analysis of a financial market based on genuine correlation and threshold method , 2011 .

[17]  Patrick Danaher,et al.  The joint graphical lasso for inverse covariance estimation across multiple classes , 2011, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[18]  Aristides Gionis,et al.  Finding recurrent sources in sequences , 2003, RECOMB '03.

[19]  Fabian Mörchen,et al.  Extracting interpretable muscle activation patterns with time series knowledge mining , 2005, Int. J. Knowl. Based Intell. Eng. Syst..

[20]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[21]  Heikki Mannila,et al.  Rule Discovery from Time Series , 1998, KDD.

[22]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[23]  Tim Oates,et al.  Visualizing Variable-Length Time Series Motifs , 2012, SDM.

[24]  Marco Cuturi,et al.  Fast Global Alignment Kernels , 2011, ICML.

[25]  Stephen P. Boyd,et al.  Greedy Gaussian segmentation of multivariate time series , 2016, Advances in Data Analysis and Classification.

[26]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[27]  Kazuya Takeda,et al.  Driver Modeling Based on Driving Behavior and Its Evaluation in Driver Identification , 2007, Proceedings of the IEEE.

[28]  Padhraic Smyth,et al.  Clustering Sequences with Hidden Markov Models , 1996, NIPS.

[29]  Michael I. Jordan Graphical Models , 2003 .

[30]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[31]  Evgenia Dimitriadou Convex Clustering Methods and Clustering Indexes , 2015 .

[32]  Eamonn Keogh Exact Indexing of Dynamic Time Warping , 2002, VLDB.

[33]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[34]  Saeed Aghabozorgi,et al.  A Review of Subsequence Time Series Clustering , 2014, TheScientificWorldJournal.

[35]  Adrian E. Raftery,et al.  MCLUST Version 3: An R Package for Normal Mixture Modeling and Model-Based Clustering , 2006 .

[36]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[37]  M. Hestenes Multiplier and gradient methods , 1969 .

[38]  Eamonn J. Keogh,et al.  Clustering of time-series subsequences is meaningless: implications for previous and future research , 2004, Knowledge and Information Systems.

[39]  Pradeep Ravikumar,et al.  BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables , 2013, NIPS.

[40]  Robert M. Gray,et al.  Toeplitz and Circulant Matrices: A Review , 2005, Found. Trends Commun. Inf. Theory.

[41]  Robert M. Gray,et al.  Toeplitz And Circulant Matrices: A Review (Foundations and Trends(R) in Communications and Information Theory) , 2006 .

[42]  Li Wei,et al.  Experiencing SAX: a novel symbolic representation of time series , 2007, Data Mining and Knowledge Discovery.

[43]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[44]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[45]  Heikki Mannila,et al.  Time series segmentation for context recognition in mobile devices , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[46]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[47]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[48]  Leonhard Held,et al.  Gaussian Markov Random Fields: Theory and Applications , 2005 .

[49]  Su-In Lee,et al.  Node-based learning of multiple Gaussian graphical models , 2013, J. Mach. Learn. Res..

引用
SummerTime: Variable-length Time Series Summarization with Application to Physical Activity Analysis
ACM Trans. Comput. Heal.
2020
TATC: Predicting Alzheimer's Disease with Actigraphy Data
KDD
2018
Drive2Vec: Multiscale State-Space Embedding of Vehicular Sensor Data
2018 21st International Conference on Intelligent Transportation Systems (ITSC)
2018
Forecasting market states
Machine Learning and AI in Finance
2018
Subspace clustering for situation assessment in aquatic drones
SAC
2019
Spatial-Temporal Demand Forecasting and Competitive Supply via Graph Convolutional Networks
ArXiv
2020
Clustering Hashtags Using Temporal Patterns
WISE
2020
Early Recognition of Driving Intention for Lane Change Based on Recurrent Hidden Semi-Markov Model
IEEE Transactions on Vehicular Technology
2020
Modeling Combinatorial Evolution in Time Series Prediction
ArXiv
2019
Pattern Recognition in Multivariate Time Series: Towards an Automated Event Detection Method for Smart Manufacturing Systems
2020
Developing Measures of Cognitive Impairment in the Real World from Consumer-Grade Multimodal Sensor Streams
KDD
2019
Extensible Lower Bound Function for Dynamic Time Warping
2019 15th International Conference on Mobile Ad-Hoc and Sensor Networks (MSN)
2019
Soybean Price Pattern Discovery Via Toeplitz Inverse Covariance-Based Clustering
Int. J. Agric. Environ. Inf. Syst.
2019
Challenges in Vessel Behavior and Anomaly Detection: From Classical Machine Learning to Deep Learning
Canadian AI
2020
Contrast Pattern Mining in Paired Multivariate Time Series of a Controlled Driving Behavior Experiment
ACM Trans. Spatial Algorithms Syst.
2020
Riding Pattern Recognition for Powered Two-Wheelers Using a Long Short-Term Memory Network
2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC)
2020
Time-Series Event Prediction with Evolutionary State Graph
WSDM
2020
Modeling Evolutionary State Graph for Time Series Prediction
2019
Steps towards end-to-end neural speaker diarization. (Étapes vers un système neuronal de bout en bout pour la tâche de segmentation et de regroupement en locuteurs)
2019
Toeplitz Inverse Covariance based Robust Speaker Clustering for Naturalistic Audio Streams
INTERSPEECH
2019