Discovering Multivariate Motifs using Subsequence Density Estimation and Greedy Mixture Learning

The problem of locating motifs in real-valued, multivariate time series data involves the discovery of sets of recurring patterns embedded in the time series. Each set is composed of several non-overlapping subsequences and constitutes a motif because all of the included subsequences are similar. The ability to automatically discover such motifs allows intelligent systems to form endogenously meaningful representations of their environment through unsupervised sensor analysis. In this paper, we formulate a unifying view of motif discovery as a problem of locating regions of high density in the space of all time series subsequences. Our approach is efficient (sub-quadratic in the length of the data), requires fewer user-specified parameters than previous methods, and naturally allows variable length motif occurrences and non-linear temporal warping. We evaluate the performance of our approach using four data sets from different domains including on-body inertial sensors and speech.

[1]  Eamonn J. Keogh,et al.  Exact indexing of dynamic time warping , 2002, Knowledge and Information Systems.

[2]  Ting Liu,et al.  Fast Nonparametric Machine Learning Algorithms for High-dimensional Massive Data and Applications | a Thesis Proposal , 2005 .

[3]  Eamonn J. Keogh,et al.  Probabilistic discovery of time series motifs , 2003, KDD '03.

[4]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[5]  Dimitrios I. Fotiadis,et al.  Greedy mixture learning for multiple motif discovery in biological sequences , 2003, Bioinform..

[6]  Tim Oates,et al.  PERUSE: An unsupervised algorithm for finding recurring patterns in time series , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[7]  Anne M. Denton Kernel-density-based clustering of time series subsequences using a continuous random-walk noise model , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[8]  S. Sain Multivariate locally adaptive density estimation , 2002 .

[9]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Eamonn J. Keogh,et al.  UCR Time Series Data Mining Archive , 1983 .

[11]  Irfan A. Essa,et al.  Improving Activity Discovery with Automatic Neighborhood Estimation , 2007, IJCAI.

[12]  C. Quesenberry,et al.  A nonparametric estimate of a multivariate density function , 1965 .

[13]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[14]  Eamonn J. Keogh,et al.  Clustering of time-series subsequences is meaningless: implications for previous and future research , 2003, Third IEEE International Conference on Data Mining.

[15]  Kuniaki Uehara,et al.  Discovery of Time-Series Motif from Multi-Dimensional Data Based on MDL Principle , 2005, Machine Learning.

[16]  Mark P. Styczynski,et al.  A generic motif discovery algorithm for sequential data. , 2006, Bioinformatics.

[17]  Eamonn J. Keogh,et al.  Clustering of time-series subsequences is meaningless: implications for previous and future research , 2004, Knowledge and Information Systems.

[18]  S. Sheather Density Estimation , 2004 .

[19]  Steve Young,et al.  Token passing: a simple conceptual model for connected speech recognition systems , 1989 .