Facets: Fast Comprehensive Mining of Coevolving High-order Time Series

Mining time series data has been a very active research area in the past decade, exactly because of its prevalence in many high-impact applications, ranging from environmental monitoring, intelligent transportation systems, computer network forensics, to smart buildings and many more. It has posed many fascinating research questions. Among others, three prominent challenges shared by a variety of real applications are (a) high-order; (b) contextual constraints and (c) temporal smoothness. The state-of-the-art mining algorithms are rich in addressing each of these challenges, but relatively short of comprehensiveness in attacking the coexistence of multiple or even all of these three challenges. In this paper, we propose a comprehensive method, FACETS, to simultaneously model all these three challenges. We formulate it as an optimization problem from a dynamic graphical model perspective. The key idea is to use tensor factorization to address multi-aspect challenges, and perform careful regularizations to attack both contextual and temporal challenges. Based on that, we propose an effective and scalable algorithm to solve the problem. Our experimental evaluations on three real datasets demonstrate that our method (1) outperforms its competitors in two common data mining tasks (imputation and prediction); and (2) enjoys a linear scalability w.r.t. the length of time series.

[1]  Jimeng Sun,et al.  Streaming Pattern Discovery in Multiple Time-Series , 2005, VLDB.

[2]  Hanghang Tong,et al.  Fast Mining of a Network of Coevolving Time Series , 2015, SDM.

[3]  Lei Li,et al.  Multilinear Dynamical Systems for Tensor Time Series , 2013, NIPS.

[4]  Jimeng Sun,et al.  Beyond streams and graphs: dynamic tensor analysis , 2006, KDD '06.

[5]  J. Chang,et al.  Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition , 1970 .

[6]  Jae-Gil Lee,et al.  Trajectory Outlier Detection: A Partition-and-Detect Framework , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[7]  Ruslan Salakhutdinov,et al.  Probabilistic Matrix Factorization , 2007, NIPS.

[8]  Christos Faloutsos,et al.  AutoPlait: automatic mining of co-evolving time sequences , 2014, SIGMOD Conference.

[9]  A. Rukhin Matrix Variate Distributions , 1999, The Multivariate Normal Distribution.

[10]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[11]  Xi Chen,et al.  Temporal Collaborative Filtering with Bayesian Probabilistic Tensor Factorization , 2010, SDM.

[12]  Eamonn J. Keogh,et al.  Time series shapelets: a new primitive for data mining , 2009, KDD.

[13]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[14]  T. Kolda Multilinear operators for higher-order decompositions , 2006 .

[15]  Ee-Peng Lim,et al.  Modeling Temporal Adoptions Using Dynamic Matrix Factorization , 2013, 2013 IEEE 13th International Conference on Data Mining.

[16]  Haiping Lu,et al.  Multilinear Principal Component Analysis of Tensor Objects for Recognition , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[17]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[18]  Qi Yu,et al.  Fast Multivariate Spatio-temporal Analysis via Low Rank Tensor Learning , 2014, NIPS.

[19]  Christos Faloutsos,et al.  DynaMMo: mining and summarization of coevolving sequences with missing values , 2009, KDD.

[20]  Feng Xu,et al.  Dual-Regularized One-Class Collaborative Filtering , 2014, CIKM.

[21]  L. Tucker,et al.  Some mathematical notes on three-mode factor analysis , 1966, Psychometrika.

[22]  Kush R. Varshney,et al.  Dynamic matrix factorization: A state space approach , 2011, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  Philip S. Yu,et al.  Classifying Data Streams with Skewed Class Distributions and Concept Drifts , 2008, IEEE Internet Computing.

[24]  Eamonn J. Keogh,et al.  Classification of Multi-dimensional Streaming Time Series by Weighting Each Classifier's Track Record , 2013, 2013 IEEE 13th International Conference on Data Mining.

[25]  Eamonn J. Keogh,et al.  iSAX: indexing and mining terabyte sized time series , 2008, KDD.

[26]  Michael R. Lyu,et al.  SoRec: social recommendation using probabilistic matrix factorization , 2008, CIKM '08.