Multiresolution Motif Discovery in Time Series

Time series motif discovery is an important problem with applications in a variety of areas that range from telecommunications to medicine. Several algorithms have been proposed to solve the problem. However, these algorithms heavily use expensive random disk accesses or assume the data can fit into main memory. They only consider motifs at a single resolution and are not suited to interactivity. In this work, we tackle the motif discovery problem as an approximate Top-K frequent subsequence discovery problem. We fully exploit state of the art iSAX representation multiresolution capability to obtain motifs at different resolutions. This property yields interactivity, allowing the user to navigate along the Top-K motifs structure. This permits a deeper understanding of the time series database. Further, we apply the Top-K space saving algorithm to our frequent subsequences approach. A scalable algorithm is obtained that is suitable for data stream like applications where small memory devices such as sensors are used. Our approach is scalable and disk-efficient since it only needs one single pass over the time series database. We provide empirical evidence of the validity of the algorithm in datasets from different areas that aim to represent practical applications.

[1]  Jeremy Buhler,et al.  Finding motifs using random projections , 2001, RECOMB.

[2]  Eamonn J. Keogh,et al.  Disk aware discord discovery: finding unusual time series in terabyte sized datasets , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[3]  Kuniaki Uehara,et al.  Discovery of Time-Series Motif from Multi-Dimensional Data Based on MDL Principle , 2005, Machine Learning.

[4]  Jessica Lin,et al.  Finding Motifs in Time Series , 2002, KDD 2002.

[5]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[6]  Divyakant Agrawal,et al.  Efficient Computation of Frequent and Top-k Elements in Data Streams , 2005, ICDT.

[7]  Paulo J. Azevedo,et al.  Mining Approximate Motifs in Time Series , 2006, Discovery Science.

[8]  Eamonn J. Keogh,et al.  iSAX: indexing and mining terabyte sized time series , 2008, KDD.

[9]  Hui Ding,et al.  Querying and mining of time series data: experimental comparison of representations and distance measures , 2008, Proc. VLDB Endow..

[10]  Eamonn J. Keogh,et al.  Detecting time series motifs under uniform scaling , 2007, KDD '07.

[11]  Paulo J. Azevedo,et al.  Detection of Hydrophobic Clusters in Molecular Dynamics Protein Unfolding Simulations Using Association Rules , 2005, ISBMDA.

[12]  Paulo J. Azevedo,et al.  Evaluating Protein Motif Significance Measures: A Case Study on Prosite Patterns , 2007, 2007 IEEE Symposium on Computational Intelligence and Data Mining.

[13]  Dimitrios Gunopulos,et al.  A Wavelet-Based Anytime Algorithm for K-Means Clustering of Time Series , 2003 .

[14]  Irfan A. Essa,et al.  Detecting Subdimensional Motifs: An Efficient Algorithm for Generalized Multivariate Pattern Discovery , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[15]  Oscar Gama,et al.  An improved MAC protocol with a reconfiguration scheme for wireless e-health systems requiring quality of service , 2009, 2009 1st International Conference on Wireless Communication, Vehicular Technology, Information Theory and Aerospace & Electronic Systems Technology.

[16]  Eamonn J. Keogh,et al.  Probabilistic discovery of time series motifs , 2003, KDD '03.

[17]  Li Wei,et al.  Experiencing SAX: a novel symbolic representation of time series , 2007, Data Mining and Knowledge Discovery.

[18]  Eamonn J. Keogh,et al.  VizTree: a Tool for Visually Mining and Monitoring Massive Time Series Databases , 2004, VLDB.

[19]  Eamonn J. Keogh,et al.  Exact Discovery of Time Series Motifs , 2009, SDM.

[20]  Tim Oates,et al.  PERUSE: An unsupervised algorithm for finding recurring patterns in time series , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[21]  Eamonn J. Keogh,et al.  On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration , 2002, Data Mining and Knowledge Discovery.

[22]  Eamonn J. Keogh,et al.  Three Myths about Dynamic Time Warping Data Mining , 2005, SDM.