Exact Discovery of Time Series Motifs

Time series motifs are pairs of individual time series, or subsequences of a longer time series, which are very similar to each other. As with their discrete analogues in computational biology, this similarity hints at structure which has been conserved for some reason and may therefore be of interest. Since the formalism of time series motifs in 2002, dozens of researchers have used them for diverse applications in many different domains. Because the obvious algorithm for computing motifs is quadratic in the number of items, more than a dozen approximate algorithms to discover motifs have been proposed in the literature. In this work, for the first time, we show a tractable exact algorithm to find time series motifs. As we shall show through extensive experiments, our algorithm is up to three orders of magnitude faster than brute-force search in large datasets. We further show that our algorithm is fast enough to be used as a subroutine in higher level data mining algorithms for anytime classification, near-duplicate detection and summarization, and we consider detailed case studies in domains as diverse as electroencephalograph interpretation and entomological telemetry data mining.

[1]  Irfan A. Essa,et al.  Unsupervised Activity Discovery and Characterization From Event-Streams , 2005, UAI.

[2]  Christian Böhm,et al.  High performance data mining using the nearest neighbor join , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[3]  Eamonn J. Keogh,et al.  iSAX: indexing and mining terabyte sized time series , 2008, KDD.

[4]  Hui Ding,et al.  Querying and mining of time series data: experimental comparison of representations and distance measures , 2008, Proc. VLDB Endow..

[5]  Catherine Garbay,et al.  Knowledge construction from time series data using a collaborative exploration system , 2007, J. Biomed. Informatics.

[6]  Dah-Jye Lee,et al.  Anytime Classification Using the Nearest Neighbor Algorithm with Applications to Stream Mining , 2006, Sixth International Conference on Data Mining (ICDM'06).

[7]  Shinji Doki,et al.  A study of extraction method of motion patterns observed frequently from time-series posture data , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[8]  John M. Stern,et al.  Atlas of EEG Patterns , 2004 .

[9]  Kuniaki Uehara,et al.  Discovery of Time-Series Motif from Multi-Dimensional Data Based on MDL Principle , 2005, Machine Learning.

[10]  Irfan A. Essa,et al.  Discovering Multivariate Motifs using Subsequence Density Estimation and Greedy Mixture Learning , 2007, AAAI.

[11]  Alfred L. Loomis,et al.  DISTRIBUTION OF DISTURBANCE-PATTERNS IN THE HUMAN ELECTROENCEPHALOGRAM, WITH SPECIAL REFERENCE TO SLEEP , 1938 .

[12]  Tony R. Martinez,et al.  Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.

[13]  Hongjun Lu,et al.  Locating Motifs in Time-Series Data , 2005, PAKDD.

[14]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[15]  Ying Wu,et al.  Mining Motifs from Human Motion , 2008, Eurographics.

[16]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[17]  Victor B. Zordan,et al.  Animated People Textures , 2004 .

[18]  Giorgio Terracina,et al.  Discovering Representative Models in Large Time Series Databases , 2004, FQAS.

[19]  J. Patel,et al.  Declarative Querying for Biological Sequences , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[20]  Jessica Lin,et al.  Finding Motifs in Time Series , 2002, KDD 2002.

[21]  Catherine Garbay,et al.  Learning recurrent behaviors from heterogeneous multivariate time-series , 2007, Artif. Intell. Medicine.

[22]  T. Yamaguchi,et al.  Implementing an Integrated Time-Series Data Mining Environment - A Case Study of Medical KDD on Chronic Hepatitis - , 2005 .

[23]  谷口 倫一郎,et al.  Frequent Motion Pattern Extraction for Motion Recognition in Real-time Human Proxy , 2005 .

[24]  Ioannis P. Androulakis,et al.  Selecting maximally informative genes , 2005, Comput. Chem. Eng..

[25]  Mathias Hoehn,et al.  Functional Uncoupling of Hemodynamic from Neuronal Response by Inhibition of Neuronal Nitric Oxide Synthase , 2007, Journal of cerebral blood flow and metabolism : official journal of the International Society of Cerebral Blood Flow and Metabolism.

[26]  King-Sun Fu,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Publication Information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Gonzalo Navarro,et al.  Effective Proximity Retrieval by Ordering Permutations , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Eamonn J. Keogh,et al.  LB_Keogh supports exact indexing of shapes under rotation invariance with arbitrary representations and distance measures , 2006, VLDB.

[29]  Gregory P. Walker,et al.  Characterization and correlation of DC electrical penetration graph waveforms with feeding behavior of beet leafhopper, Circulifer tenellus , 2009 .

[30]  Eamonn J. Keogh,et al.  Probabilistic discovery of time series motifs , 2003, KDD '03.

[31]  Eamonn J. Keogh Efficiently Finding Arbitrarily Scaled Patterns in Massive Time Series Databases , 2003, PKDD.

[32]  Ernst Fernando Lopes Da Silva Niedermeyer,et al.  Electroencephalography, basic principles, clinical applications, and related fields , 1982 .

[33]  Philippe Beaudoin,et al.  Motion-motif graphs , 2008, SCA '08.

[34]  James Lee Hafner,et al.  Efficient Color Histogram Indexing for Quadratic Form Distance Functions , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .