AutoPlait: automatic mining of co-evolving time sequences

Given a large collection of co-evolving multiple time-series, which contains an unknown number of patterns of different durations, how can we efficiently and effectively find typical patterns and the points of variation? How can we statistically summarize all the sequences, and achieve a meaningful segmentation? In this paper we present AutoPlait, a fully automatic mining algorithm for co-evolving time sequences. Our method has the following properties: (a) effectiveness: it operates on large collections of time-series, and finds similar segment groups that agree with human intuition; (b) scalability: it is linear with the input size, and thus scales up very well; and (c) AutoPlait is parameter-free, and requires no user intervention, no prior training, and no parameter tuning. Extensive experiments on 67GB of real datasets demonstrate that AutoPlait does indeed detect meaningful patterns correctly, and it outperforms state-of-the-art competitors as regards accuracy and speed: AutoPlait achieves near-perfect, over 95% precision and recall, and it is up to 472 times faster than its competitors.

[1]  Christopher Ré,et al.  Access Methods for Markovian Streams , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[2]  Edward Y. Chang,et al.  Adaptive stream resource management using Kalman Filters , 2004, SIGMOD '04.

[3]  Michael I. Jordan,et al.  Bayesian Nonparametric Methods for Learning Markov Switching Processes , 2010, IEEE Signal Processing Magazine.

[4]  Suman Nath,et al.  ThermoCast: a cyber-physical forecasting model for datacenters , 2011, KDD.

[5]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[6]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[7]  Haixun Wang,et al.  Finding semantics in time series , 2011, SIGMOD '11.

[8]  Eamonn J. Keogh,et al.  An online algorithm for segmenting time series , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[9]  Deepak Agarwal,et al.  Spatio-temporal models for estimating click-through rate , 2009, WWW '09.

[10]  Jimeng Sun,et al.  Streaming Pattern Discovery in Multiple Time-Series , 2005, VLDB.

[11]  Machiko Toyoda,et al.  Pattern discovery in data streams under the time warping distance , 2012, The VLDB Journal.

[12]  Jilles Vreeken,et al.  The long and the short of it: summarising event sequences with serial episodes , 2012, KDD.

[13]  Christos Faloutsos,et al.  Rise and fall patterns of information diffusion: model and implications , 2012, KDD.

[14]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[15]  Jiawei Han,et al.  CLARANS: A Method for Clustering Objects for Spatial Data Mining , 2002, IEEE Trans. Knowl. Data Eng..

[16]  Matt Welsh,et al.  Sensor networks for medical care , 2005, SenSys '05.

[17]  Christos Faloutsos,et al.  Parsimonious linear fingerprinting for time series , 2010, Proc. VLDB Endow..

[18]  J. Rissanen A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .

[19]  Haixun Wang,et al.  Leveraging spatio-temporal redundancy for RFID data cleansing , 2010, SIGMOD Conference.

[20]  Christos Faloutsos,et al.  Stream Monitoring under the Time Warping Distance , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[21]  Christos Faloutsos,et al.  Fast mining and forecasting of complex time-stamped events , 2012, KDD.

[22]  Michael I. Jordan,et al.  Sharing Features among Dynamical Systems with Beta Processes , 2009, NIPS.

[23]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[24]  Yasushi Sakurai,et al.  SPIRAL: efficient and exact model identification for hidden Markov models , 2008, KDD.

[25]  Lei Chen,et al.  On The Marriage of Lp-norms and Edit Distance , 2004, VLDB.

[26]  Christos Faloutsos,et al.  DynaMMo: mining and summarization of coevolving sequences with missing values , 2009, KDD.

[27]  Christos Faloutsos,et al.  Prediction and indexing of moving objects with unknown motion patterns , 2004, SIGMOD '04.

[28]  Gwilym M. Jenkins,et al.  Time series analysis, forecasting and control , 1971 .

[29]  Eamonn J. Keogh,et al.  Online discovery and maintenance of time series motifs , 2010, KDD.

[30]  Christos Faloutsos,et al.  BRAID: stream mining through group lag correlations , 2005, SIGMOD '05.

[31]  Jae-Gil Lee,et al.  Trajectory clustering: a partition-and-group framework , 2007, SIGMOD '07.

[32]  Christos Faloutsos,et al.  Fully automatic cross-associations , 2004, KDD.

[33]  Christian Böhm,et al.  Outlier-robust clustering using independent components , 2008, SIGMOD Conference.

[34]  Dimitrios Gunopulos,et al.  Streaming Time Series Summarization Using User-Defined Amnesic Functions , 2008, IEEE Transactions on Knowledge and Data Engineering.

[35]  Eamonn J. Keogh,et al.  Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping , 2012, KDD.

[36]  Chin-Hui Lee,et al.  Automatic recognition of keywords in unconstrained speech using hidden Markov models , 1990, IEEE Trans. Acoust. Speech Signal Process..

[37]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.

[38]  Yoram Singer,et al.  The Hierarchical Hidden Markov Model: Analysis and Applications , 1998, Machine Learning.