Efficient Discovery of Variable-length Time Series Motifs with Large Length Range in Million Scale Time Series

Detecting repeated variable-length patterns, also called variable-length motifs, has received a great amount of attention in recent years. Current state-of-the-art algorithm utilizes fixed-length motif discovery algorithm as a subroutine to enumerate variable-length motifs. As a result, it may take hours or days to execute when enumeration range is large. In this work, we introduce an approximate algorithm called HierarchIcal based Motif Enumeration (HIME) to detect variable-length motifs with a large enumeration range in million-scale time series. We show in the experiments that the scalability of the proposed algorithm is significantly better than that of the state-of-the-art algorithm. Moreover, the motif length range detected by HIME is considerably larger than previous sequence-matching based approximate variable-length motif discovery approach. We demonstrate that HIME can efficiently detect meaningful variable-length motifs in long, real world time series.

[1]  Abdullah Mueen,et al.  Enumeration of time series motifs of all lengths , 2013, 2013 IEEE 13th International Conference on Data Mining.

[2]  Eamonn J. Keogh,et al.  Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping , 2012, KDD.

[3]  Tim Oates,et al.  Visualizing Variable-Length Time Series Motifs , 2012, SDM.

[4]  MengChu Zhou,et al.  Efficient Motif Discovery for Large-Scale Time Series in Healthcare , 2015, IEEE Transactions on Industrial Informatics.

[5]  Steven K. Firth,et al.  A data management platform for personalised real-time energy feedback , 2015 .

[6]  Tim Oates,et al.  GrammarViz 2.0: A Tool for Grammar-Based Pattern Discovery in Time Series , 2014, ECML/PKDD.

[7]  Li Wei,et al.  Experiencing SAX: a novel symbolic representation of time series , 2007, Data Mining and Knowledge Discovery.

[8]  Eamonn J. Keogh,et al.  HOT SAX: efficiently finding the most unusual time series subsequence , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[9]  Stephen Shaoyi Liao,et al.  Discovering original motifs with different lengths from time series , 2008, Knowl. Based Syst..

[10]  Eamonn J. Keogh,et al.  Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View That Includes Motifs, Discords and Shapelets , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[11]  Toyoaki Nishida,et al.  Scale Invariant Multi-length Motif Discovery , 2014, IEA/AIE.

[12]  Eamonn J. Keogh,et al.  Matrix Profile II: Exploiting a Novel Algorithm and GPUs to Break the One Hundred Million Barrier for Time Series Motifs and Joins , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[13]  Vit Niennattrakul,et al.  Discovery of variable length time series motif , 2011, The 8th Electrical Engineering/ Electronics, Computer, Telecommunications and Information Technology (ECTI) Association of Thailand - Conference 2011.

[14]  Eamonn J. Keogh,et al.  Probabilistic discovery of time series motifs , 2003, KDD '03.

[15]  Man Lung Yiu,et al.  Quick-motif: An efficient and scalable framework for exact motif discovery , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[16]  Lars Schmidt-Thieme,et al.  Motif-Based Classification of Time Series with Bayesian Networks and SVMs , 2008, GfKl.

[17]  Toyoaki Nishida,et al.  Exact Discovery of Length-Range Motifs , 2014, ACIIDS.

[18]  Jeffrey M. Hausdorff,et al.  Physionet: Components of a New Research Resource for Complex Physiologic Signals". Circu-lation Vol , 2000 .

[19]  Ian H. Witten,et al.  Identifying Hierarchical Structure in Sequences: A linear-time algorithm , 1997, J. Artif. Intell. Res..

[20]  Eamonn J. Keogh,et al.  Online discovery and maintenance of time series motifs , 2010, KDD.

[21]  Ying Wu,et al.  Mining Motifs from Human Motion , 2008, Eurographics.

[22]  Eamonn J. Keogh,et al.  Discovery of Meaningful Rules in Time Series , 2015, KDD.

[23]  Paulo J. Azevedo,et al.  Multiresolution Motif Discovery in Time Series , 2010, SDM.

[24]  T. Graves,et al.  The male-specific region of the human Y chromosome is a mosaic of discrete sequence classes , 2003, Nature.

[25]  Jessica Lin,et al.  Finding Motifs in Time Series , 2002, KDD 2002.

[26]  Irfan A. Essa,et al.  Discovering Characteristic Actions from On-Body Sensor Data , 2006, 2006 10th IEEE International Symposium on Wearable Computers.

[27]  Eamonn J. Keogh,et al.  Exact Discovery of Time Series Motifs , 2009, SDM.

[28]  Eamonn J. Keogh,et al.  Rare Time Series Motif Discovery from Unbounded Streams , 2014, Proc. VLDB Endow..

[29]  Tim Oates,et al.  RPM: Representative Pattern Mining for Efficient Time Series Classification , 2016, EDBT.