Finding semantics in time series

In order to understand a complex system, we analyze its output or its log data. For example, we track a system's resource consumption (CPU, memory, message queues of different types, etc) to help avert system failures; we examine economic indicators to assess the severity of a recession; we monitor a patient's heart rate or EEG for disease diagnosis. Time series data is involved in many such applications. Much work has been devoted to pattern discovery from time series data, but not much has attempted to use the time series data to unveil a system's internal dynamics. In this paper, we go beyond learning patterns from time series data. We focus on obtaining a better understanding of its data generating mechanism, and we regard patterns and their temporal relations as organic components of the hidden mechanism. Specifically, we propose to model time series data using a novel pattern-based hidden Markov model (pHMM), which aims at revealing a global picture of the system that generates the time series data. We propose an iterative approach to refine pHMMs learned from the data. In each iteration, we use the current pHMM to guide time series segmentation and clustering, which enables us to learn a more accurate pHMM. Furthermore, we propose three pruning strategies to speed up the refinement process. Empirical results on real datasets demonstrate the feasibility and effectiveness of the proposed approach.

[1]  Shuigeng Zhou,et al.  Concept Clustering of Evolving Data , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[2]  Eamonn J. Keogh,et al.  Finding Time Series Motifs in Disk-Resident Data , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[3]  Jianyong Wang,et al.  Mining Complex Time-Series Data by Learning Markovian Models , 2006, Sixth International Conference on Data Mining (ICDM'06).

[4]  Jean-François Mari,et al.  A second-order HMM for high performance word and phoneme-based continuous speech recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[5]  Sabino Gadaleta,et al.  Time series prediction by estimating Markov probabilities through topology preserving maps , 1999, Optics + Photonics.

[6]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.

[7]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[8]  Eamonn J. Keogh,et al.  Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases , 2001, Knowledge and Information Systems.

[9]  P. A. Blight The Analysis of Time Series: An Introduction , 1991 .

[10]  Haixun Wang,et al.  Online Anomaly Prediction for Robust Cluster Systems , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[11]  Haixun Wang,et al.  Adaptive system anomaly prediction for large-scale hosting infrastructures , 2010, PODC.

[12]  Philip S. Yu,et al.  Stop Chasing Trends: Discovering High Order Models in Evolving Data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[13]  Hui Ding,et al.  Querying and mining of time series data: experimental comparison of representations and distance measures , 2008, Proc. VLDB Endow..

[14]  Eamonn J. Keogh,et al.  An online algorithm for segmenting time series , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[15]  Ian Witten,et al.  Data Mining , 2000 .

[16]  J. V. van Wijk,et al.  Cluster and calendar based visualization of time series data , 1999, Proceedings 1999 IEEE Symposium on Information Visualization (InfoVis'99).

[17]  Haixun Wang,et al.  Landmarks: a new model for similarity-based pattern querying in time series databases , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[18]  Gwilym M. Jenkins,et al.  Time series analysis, forecasting and control , 1971 .

[19]  Philip S. Yu,et al.  Suppressing model overfitting in mining concept-drifting data streams , 2006, KDD '06.

[20]  Ada Wai-Chee Fu,et al.  Efficient time series matching by wavelets , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[21]  Haixun Wang,et al.  An algorithmic approach to event summarization , 2010, SIGMOD Conference.

[22]  Suman Nath,et al.  Managing Massive Time Series Streams with MultiScale Compressed Trickles , 2009, Proc. VLDB Endow..

[23]  Hongyan Li,et al.  Effective variation management for pseudo periodical streams , 2007, SIGMOD '07.

[24]  Eamonn J. Keogh,et al.  An Enhanced Representation of Time Series Which Allows Fast and Accurate Classification, Clustering and Relevance Feedback , 1998, KDD.

[25]  Eamonn J. Keogh A decade of progress in indexing and mining large time series databases , 2006, VLDB.

[26]  Eamonn J. Keogh,et al.  Online discovery and maintenance of time series motifs , 2010, KDD.

[27]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[28]  Li Wei,et al.  Experiencing SAX: a novel symbolic representation of time series , 2007, Data Mining and Knowledge Discovery.

[29]  Dana Ron,et al.  The power of amnesia: Learning probabilistic automata with variable memory length , 1996, Machine Learning.