AutoCyclone: Automatic Mining of Cyclic Online Activities with Robust Tensor Factorization

Given a collection of seasonal time-series, how can we find regular (cyclic) patterns and outliers (i.e. rare events)? These two types of patterns are hidden and mixed in the time-varying activities. How can we robustly separate regular patterns and outliers, without requiring any prior information? We present CycloneM, a unifying model to capture both cyclic patterns and outliers, and CycloneFact, a novel algorithm which solves the above problem. We also present an automatic mining framework AutoCyclone, based on CycloneM and CycloneFact. Our method has the following properties; (a) effective: it captures important cyclic features such as trend and seasonality, and distinguishes regular patterns and rare events clearly; (b) robust and accurate: it detects the above features and patterns accurately against outliers; (c) fast: CycloneFact takes linear time in the data size and typically converges in a few iterations; (d) parameter free: our modeling framework frees the user from having to provide parameter values. Extensive experiments on 4 real datasets demonstrate the benefits of the proposed model and algorithm, in that the model can capture latent cyclic patterns, trends and rare events, and the algorithm outperforms the existing state-of-the-art approaches. CycloneFact was up to 5 times more accurate and 20 times faster than top competitors.

[1]  Christos Faloutsos,et al.  Fast efficient and scalable Core Consistency Diagnostic for the parafac decomposition for big sparse tensors , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Christos Faloutsos,et al.  AutoPlait: automatic mining of co-evolving time sequences , 2014, SIGMOD Conference.

[3]  Haixun Wang,et al.  Finding semantics in time series , 2011, SIGMOD '11.

[4]  Naoki Abe,et al.  Proximity-Based Anomaly Detection Using Sparse Structure Learning , 2009, SDM.

[5]  R. Bro,et al.  A new efficient method for determining the number of components in PARAFAC models , 2003 .

[6]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[7]  Yasushi Sakurai,et al.  Regime Shifts in Streams: Real-time Forecasting of Co-evolving Time Sequences , 2016, KDD.

[8]  Christos Faloutsos,et al.  Parsimonious linear fingerprinting for time series , 2010, Proc. VLDB Endow..

[9]  Lei Li,et al.  Multilinear Dynamical Systems for Tensor Time Series , 2013, NIPS.

[10]  Christos Faloutsos,et al.  GigaTensor: scaling tensor analysis up by 100 times - algorithms and discoveries , 2012, KDD.

[11]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[12]  Christos Faloutsos,et al.  FUNNEL: automatic mining of spatially coevolving epidemics , 2014, KDD.

[13]  Eamonn J. Keogh,et al.  Time series shapelets: a new primitive for data mining , 2009, KDD.

[14]  Evangelos E. Papalexakis,et al.  Automatic Unsupervised Tensor Mining with Quality Assessment , 2015, SDM.

[15]  Christos Faloutsos,et al.  The Web as a Jungle: Non-Linear Dynamical Systems for Co-evolving Online Activities , 2015, WWW.

[16]  Jimeng Sun,et al.  Rubik: Knowledge Guided Tensor Factorization and Completion for Health Data Analytics , 2015, KDD.

[17]  Jimeng Sun,et al.  Community Discovery via Metagraph Factorization , 2011, TKDD.

[18]  References , 1971 .

[19]  Eamonn J. Keogh,et al.  An online algorithm for segmenting time series , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[20]  Hongliang Fei,et al.  Anomaly localization for network data streams with graph joint sparse PCA , 2011, KDD.

[21]  Christos Faloutsos,et al.  Non-Linear Mining of Competing Local Activities , 2016, WWW.

[22]  Jure Leskovec,et al.  Finding progression stages in time-evolving event sequences , 2014, WWW.

[23]  Gonzalo Mateos,et al.  Robust PCA as Bilinear Decomposition With Outlier-Sparsity Regularization , 2011, IEEE Transactions on Signal Processing.

[24]  Jimeng Sun,et al.  Marble: high-throughput phenotyping from electronic health records via sparse nonnegative tensor factorization , 2014, KDD.

[25]  Wei Cheng,et al.  Ranking Causal Anomalies via Temporal and Dynamical Analysis on Vanishing Correlations , 2016, KDD.