Visualizing Variable-Length Time Series Motifs

The problem of time series motif discovery has received a lot of attention from researchers in the past decade. Most existing work on finding time series motifs require that the length of the motifs be known in advance. However, such information is not always available. In addition, motifs of different lengths may co-exist in a time series dataset. In this work, we develop a motif visualization system based on grammar induction. We demonstrate that grammar induction in time series can effectively identify repeated patterns without prior knowledge of their lengths. The motifs discovered by the visualization system are variablelengths in two ways. Not only can the inter-motif subsequences have variable lengths, the intra-motif subsequences also are not restricted to have identical length—a unique property that is desirable, but has not been seen in the literature.

[1]  Rodger Staden,et al.  Methods for discovering novel motifs in nucleic acid sequences , 1989, Comput. Appl. Biosci..

[2]  Giorgio Terracina,et al.  Discovering Representative Models in Large Time Series Databases , 2004, FQAS.

[3]  Dan He Using Suffix Tree to Discover Complex Repetitive Patterns in DNA Sequences , 2006, 2006 International Conference of the IEEE Engineering in Medicine and Biology Society.

[4]  Stephen Shaoyi Liao,et al.  Discovering original motifs with different lengths from time series , 2008, Knowl. Based Syst..

[5]  Jeremy Buhler,et al.  Finding motifs using random projections , 2001, RECOMB.

[6]  Eamonn J. Keogh,et al.  UCR Time Series Data Mining Archive , 1983 .

[7]  Kwong-Sak Leung,et al.  Data Mining Using Grammar Based Genetic Programming and Applications , 2000 .

[8]  Irfan A. Essa,et al.  Discovering Multivariate Motifs using Subsequence Density Estimation and Greedy Mixture Learning , 2007, AAAI.

[9]  Aristides Gionis,et al.  Finding recurrent sources in sequences , 2003, RECOMB '03.

[10]  Eamonn J. Keogh,et al.  Mining motifs in massive time series databases , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[11]  Jessica Lin,et al.  Finding Motifs in Time Series , 2002, KDD 2002.

[12]  Eric Lehman,et al.  Approximation algorithms for grammar-based data compression , 2002 .

[13]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[14]  Jessica Lin,et al.  Approximate variable-length time series motif discovery using grammar inference , 2010, MDMKDD '10.

[15]  Ian H. Witten,et al.  Identifying Hierarchical Structure in Sequences: A linear-time algorithm , 1997, J. Artif. Intell. Res..

[16]  Philippe Beaudoin,et al.  Motion-motif graphs , 2008, SCA '08.

[17]  Jeremy Buhler,et al.  Finding Motifs Using Random Projections , 2002, J. Comput. Biol..

[18]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[19]  Eamonn J. Keogh,et al.  Exact Discovery of Time Series Motifs , 2009, SDM.

[20]  P. Schönemann On artificial intelligence , 1985, Behavioral and Brain Sciences.

[21]  Tim Oates,et al.  PERUSE: An unsupervised algorithm for finding recurring patterns in time series , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[22]  Pat Langley Simplicity and Representation Change in Grammar Induction , 1995 .

[23]  Kuniaki Uehara,et al.  Discovery of Time-Series Motif from Multi-Dimensional Data Based on MDL Principle , 2005, Machine Learning.

[24]  Eamonn J. Keogh,et al.  Probabilistic discovery of time series motifs , 2003, KDD '03.

[25]  Catherine Garbay,et al.  Knowledge construction from time series data using a collaborative exploration system , 2007, J. Biomed. Informatics.

[26]  Richard E. Ladner,et al.  Enhanced Sequitur for finding structure in data , 2003, Data Compression Conference, 2003. Proceedings. DCC 2003.

[27]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[28]  Irfan Essa,et al.  Activity Discovery : Sparse Motifs from Multivariate Time Series , .

[29]  David J. Goodman,et al.  Personal Communications , 1994, Mobile Communications.

[30]  A. Lesk COMPUTATIONAL MOLECULAR BIOLOGY , 1988, Proceeding of Data For Discovery.

[31]  Ying Wu,et al.  Mining Motifs from Human Motion , 2008, Eurographics.

[32]  Paulo J. Azevedo,et al.  Multiresolution Motif Discovery in Time Series , 2010, SDM.

[33]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[34]  Abhi Shelat,et al.  Approximation algorithms for grammar-based compression , 2002, SODA '02.

[35]  Edward Rosten,et al.  Learning Object Location Predictors with Boosting and Grammar-Guided Feature Extraction , 2009, BMVC.

[36]  Li Wei,et al.  Experiencing SAX: a novel symbolic representation of time series , 2007, Data Mining and Knowledge Discovery.