Rare Time Series Motif Discovery from Unbounded Streams

The detection of time series motifs, which are approximately repeated subsequences in time series streams, has been shown to have great utility as a subroutine in many higher-level data mining algorithms. However, this detection becomes much harder in cases where the motifs of interest are vanishingly rare or when faced with a never-ending stream of data. In this work we investigate algorithms to find such rare motifs. We demonstrate that under reasonable assumptions we must abandon any hope of an exact solution to the motif problem as it is normally defined; however, we introduce algorithms that allow us to solve the underlying problem with high probability.

[1]  Laura J. Grundy,et al.  A dictionary of behavioral motifs reveals clusters of genes affecting Caenorhabditis elegans locomotion , 2012, Proceedings of the National Academy of Sciences.

[2]  Eamonn J. Keogh,et al.  Online discovery and maintenance of time series motifs , 2010, KDD.

[3]  Dipankar Dasgupta,et al.  Novelty detection in time series data using ideas from immunology , 1996 .

[4]  James R. Goodman,et al.  Instruction Cache Replacement Policies and Organizations , 1985, IEEE Transactions on Computers.

[5]  Richard W. Hamming,et al.  Error detecting and error correcting codes , 1950 .

[6]  Li Wei,et al.  Experiencing SAX: a novel symbolic representation of time series , 2007, Data Mining and Knowledge Discovery.

[7]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[8]  Paulo J. Azevedo,et al.  Significant motifs in time series , 2012, Stat. Anal. Data Min..

[9]  Fred Popowich,et al.  AMPds: A public dataset for load disaggregation and eco-feedback research , 2013, 2013 IEEE Electrical Power & Energy Conference.

[10]  Eamonn J. Keogh,et al.  Probabilistic discovery of time series motifs , 2003, KDD '03.

[11]  Majid Sarrafzadeh,et al.  Toward Unsupervised Activity Discovery Using Multi-Dimensional Motif Detection in Time Series , 2009, IJCAI.

[12]  Marios Hadjieleftheriou,et al.  Methods for finding frequent items in data streams , 2010, The VLDB Journal.

[13]  Eamonn J. Keogh,et al.  Experimental comparison of representation methods and distance measures for time series data , 2010, Data Mining and Knowledge Discovery.

[14]  Eamonn J. Keogh,et al.  On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration , 2002, Data Mining and Knowledge Discovery.

[15]  Eamonn J. Keogh,et al.  Towards never-ending learning from time series streams , 2013, KDD.

[16]  Eli Upfal,et al.  Probability and Computing: Randomized Algorithms and Probabilistic Analysis , 2005 .

[17]  Lewis Girod,et al.  Automated Wildlife Monitoring Using Self-Configuring Sensor Networks Deployed in Natural Habitats , 2007 .

[18]  Jessica Lin,et al.  Finding Motifs in Time Series , 2002, KDD 2002.

[19]  Ashish Goel,et al.  Instability of FIFO at arbitrarily low rates in the adversarial queuing model , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[20]  Eli Upfal,et al.  Mining top-K frequent itemsets through progressive sampling , 2010, Data Mining and Knowledge Discovery.

[21]  François Ingelrest,et al.  SensorScope: Out-of-the-Box Environmental Monitoring , 2008, 2008 International Conference on Information Processing in Sensor Networks (ipsn 2008).

[22]  Eamonn J. Keogh,et al.  A disk-aware algorithm for time series motif discovery , 2011, Data Mining and Knowledge Discovery.

[23]  Ada Wai-Chee Fu,et al.  Efficient time series matching by wavelets , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[24]  Abdullah Mueen,et al.  Enumeration of time series motifs of all lengths , 2013, 2013 IEEE 13th International Conference on Data Mining.

[25]  Patrick Gros,et al.  Fast repetition detection in TV streams using duration patterns , 2013, 2013 11th International Workshop on Content-Based Multimedia Indexing (CBMI).

[26]  Beng Chin Ooi,et al.  Efficient indexing structures for mining frequent patterns , 2002, Proceedings 18th International Conference on Data Engineering.

[27]  Eamonn J. Keogh,et al.  Exact Discovery of Time Series Motifs , 2009, SDM.

[28]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[29]  Eamonn J. Keogh,et al.  Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases , 2001, Knowledge and Information Systems.

[30]  Ashish Goel,et al.  Instability of FIFO at Arbitrarily Low Rates in the Adversarial Queueing Model , 2004, SIAM J. Comput..