Similarity Preserving Representation Learning for Time Series Analysis

A considerable amount of machine learning algorithms take instance-feature matrices as their inputs. As such, they cannot directly analyze time series data due to its temporal nature, usually unequal lengths, and complex properties. This is a great pity since many of these algorithms are effective, robust, efficient, and easy to use. In this paper, we bridge this gap by proposing an efficient representation learning framework that is able to convert a set of time series with equal or unequal lengths to a matrix format. In particular, we guarantee that the pairwise similarities between time series are well preserved after the transformation. The learned feature representation is particularly suitable to the class of learning problems that are sensitive to data similarities. Given a set of n time series, we first construct an n×n partially observed similarity matrix by randomly samplingO(n log n) pairs of time series and computing their pairwise similarities. We then propose an extremely efficient algorithm that solves a highly non-convex and NP-hard problem to learn new features based on the partially observed similarity matrix. We use the learned features to conduct experiments on both data classification and clustering tasks. Our extensive experimental results demonstrate that the proposed framework is both effective and efficient.

[1]  Christos Faloutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[2]  Justin Murray,et al.  Volume 23 , 1988, Experimental Gerontology.

[3]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[4]  G. Illies,et al.  Communications in Mathematical Physics , 2004 .

[5]  Eamonn J. Keogh,et al.  Extracting Optimal Performance from Dynamic Time Warping , 2016, KDD.

[6]  George C. Runger,et al.  A time series forest for classification and feature extraction , 2013, Inf. Sci..

[7]  Meinard Müller,et al.  Information retrieval for music and motion , 2007 .

[8]  Ada Wai-Chee Fu,et al.  Efficient time series matching by wavelets , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[9]  George Michailidis,et al.  Low-Rank and Sparse Modeling of High-dimensional Vector Autoregressions , 2015 .

[10]  Eamonn J. Keogh,et al.  Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases , 2001, Knowledge and Information Systems.

[11]  E. Silerova,et al.  Knowledge and information systems , 2018 .

[12]  Amy Loutfi,et al.  A review of unsupervised feature learning and deep learning for time-series modeling , 2014, Pattern Recognit. Lett..

[13]  Li Wei,et al.  Experiencing SAX: a novel symbolic representation of time series , 2007, Data Mining and Knowledge Discovery.

[14]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[15]  Xiaozhe Wang,et al.  Characteristic-Based Clustering for Time Series Data , 2006, Data Mining and Knowledge Discovery.

[16]  Suresh Venkatasubramanian,et al.  Curve Matching, Time Warping, and Light Fields: New Algorithms for Computing Similarity between Curves , 2007, Journal of Mathematical Imaging and Vision.

[17]  Girolamo Cardano,et al.  Ars magna or The rules of algebra , 1993 .

[18]  G. Miller,et al.  Cognitive science. , 1981, Science.

[19]  Vit Niennattrakul,et al.  Inaccuracies of Shape Averaging Method Using Dynamic Time Warping for Time Series Data , 2007, International Conference on Computational Science.

[20]  Tengyu Ma,et al.  Matrix Completion has No Spurious Local Minimum , 2016, NIPS.

[21]  Li Wei,et al.  Fast time series classification using numerosity reduction , 2006, ICML.

[22]  I. Elamvazuthi,et al.  Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques , 2010, ArXiv.

[23]  Hicham Noçairi,et al.  Combination of dynamic time warping and multivariate analysis for the comparison of comprehensive two-dimensional gas chromatograms: application to plant extracts. , 2009, Journal of chromatography. A.

[24]  J. Meigs,et al.  WHO Technical Report , 1954, The Yale Journal of Biology and Medicine.

[25]  Zhi-Quan Luo,et al.  Guaranteed Matrix Completion via Non-Convex Factorization , 2014, IEEE Transactions on Information Theory.

[26]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[27]  V. Marčenko,et al.  DISTRIBUTION OF EIGENVALUES FOR SOME SETS OF RANDOM MATRICES , 1967 .

[28]  Xindong Wu,et al.  10 Challenging Problems in Data Mining Research , 2006, Int. J. Inf. Technol. Decis. Mak..

[29]  Shuliang Wang,et al.  Data Mining and Knowledge Discovery , 2005, Mathematical Principles of the Internet.

[30]  Luis Gravano,et al.  k-Shape: Efficient and Accurate Clustering of Time Series , 2016, SGMD.

[31]  Inderjit S. Dhillon,et al.  Fast coordinate descent methods with variable selection for non-negative matrix factorization , 2011, KDD.

[32]  Rynson W. H. Lau,et al.  Knowledge and Data Engineering for e-Learning Special Issue of IEEE Transactions on Knowledge and Data Engineering , 2008 .

[33]  Haim J. Wolfson On curve matching , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Zhi-Quan Luo,et al.  Guaranteed Matrix Completion via Nonconvex Factorization , 2015, FOCS.

[35]  Peter Kulchyski and , 2015 .

[36]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[37]  B. C. Brookes,et al.  Information Sciences , 2020, Cognitive Skills You Need for the 21st Century.

[38]  Meinard Müller DTW-Based Motion Comparison and Retrieval , 2007 .

[39]  Samsu Sempena,et al.  Human action recognition using Dynamic Time Warping , 2011, Proceedings of the 2011 International Conference on Electrical Engineering and Informatics.

[40]  Inderjit S. Dhillon,et al.  Coordinate Descent Methods for Symmetric Nonnegative Matrix Factorization , 2015, ArXiv.

[41]  Inderjit S. Dhillon,et al.  Efficient and Non-Convex Coordinate Descent for Symmetric Nonnegative Matrix Factorization , 2015, IEEE Transactions on Signal Processing.

[42]  Horng-Tzer Yau,et al.  Local Semicircle Law and Complete Delocalization for Wigner Random Matrices , 2008, 0803.0542.

[43]  Eamonn J. Keogh,et al.  Experimental comparison of representation methods and distance measures for time series data , 2010, Data Mining and Knowledge Discovery.

[44]  Inderjit S. Dhillon,et al.  Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems , 2012, 2012 IEEE 12th International Conference on Data Mining.

[45]  Tarik Arici,et al.  Gesture Recognition using Skeleton Data with Weighted Dynamic Time Warping , 2013, VISAPP.

[46]  Christos Faloutsos,et al.  Efficiently supporting ad hoc queries in large datasets of time sequences , 1997, SIGMOD '97.

[47]  Eamonn J. Keogh,et al.  Addressing Big Data Time Series: Mining Trillions of Time Series Subsequences Under Dynamic Time Warping , 2013, TKDD.

[48]  Lei Li,et al.  Time Series Clustering: Complex is Simpler! , 2011, ICML.

[49]  Ronald C. Read,et al.  The knot book: An elementary introduction to the mathematical theory of knots , 1997, Complex..

[50]  Yannis Manolopoulos,et al.  Feature-based classification of time-series data , 2001 .

[51]  Nick S. Jones,et al.  Highly Comparative Feature-Based Time-Series Classification , 2014, IEEE Transactions on Knowledge and Data Engineering.

[52]  Luis Gravano,et al.  k-Shape: Efficient and Accurate Clustering of Time Series , 2015, SIGMOD Conference.