Admissible Time Series Motif Discovery with Missing Data

The discovery of time series motifs has emerged as one of the most useful primitives in time series data mining. Researchers have shown its utility for exploratory data mining, summarization, visualization, segmentation, classification, clustering, and rule discovery. Although there has been more than a decade of extensive research, there is still no technique to allow the discovery of time series motifs in the presence of missing data, despite the well-documented ubiquity of missing data in scientific, industrial, and medical datasets. In this work, we introduce a technique for motif discovery in the presence of missing data. We formally prove that our method is admissible, producing no false negatives. We also show that our method can piggy-back off the fastest known motif discovery method with a small constant factor time/space overhead. We will demonstrate our approach on diverse datasets with varying amounts of missing data

[1]  Lars Schmidt-Thieme,et al.  Motif-Based Classification of Time Series with Bayesian Networks and SVMs , 2008, GfKl.

[2]  Eamonn J. Keogh,et al.  Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping , 2012, KDD.

[3]  Eamonn J. Keogh,et al.  Matrix Profile II: Exploiting a Novel Algorithm and GPUs to Break the One Hundred Million Barrier for Time Series Motifs and Joins , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[4]  Tianrui Li,et al.  ST-MVL: Filling Missing Values in Geo-Sensory Time Series Data , 2016, IJCAI.

[5]  Insup Lee,et al.  Video Quality Driven Buffer Sizing via Frame Drops , 2011, 2011 IEEE 17th International Conference on Embedded and Real-Time Computing Systems and Applications.

[6]  Amy McGovern,et al.  Identifying predictive multi-dimensional time series motifs: an application to severe weather prediction , 2010, Data Mining and Knowledge Discovery.

[7]  Clara E Yoon,et al.  Earthquake detection through computationally efficient similarity search , 2015, Science Advances.

[8]  Wei Liu,et al.  An incremental algorithm for discovering routine behaviours from smart meter data , 2016, Knowl. Based Syst..

[9]  Eamonn J. Keogh,et al.  Exact Discovery of Time Series Motifs , 2009, SDM.

[10]  Eamonn J. Keogh,et al.  Discovery of Meaningful Rules in Time Series , 2015, KDD.

[11]  G. Kelly,et al.  Body temperature variability (Part 1): a review of the history of body temperature and its variability due to site selection, biological rhythms, fitness, and aging. , 2006, Alternative medicine review : a journal of clinical therapeutic.

[12]  Shusheng Bi,et al.  The design of a throwable two-wheeled reconnaissance robot , 2012, 2012 IEEE International Conference on Robotics and Biomimetics (ROBIO).

[13]  Smruti R. Sarangi,et al.  DUST: a generalized notion of similarity between uncertain time series , 2010, KDD.

[14]  G. King,et al.  What to Do about Missing Values in Time‐Series Cross‐Section Data , 2010 .

[15]  Manish Marwah,et al.  Sustainable operation and management of data center chillers using temporal data mining , 2009, KDD.

[16]  Philip S. Yu,et al.  PROUD: a probabilistic approach to processing similarity queries over uncertain data streams , 2009, EDBT '09.

[17]  Renjie Huang,et al.  Air-dropped sensor network for real-time high-fidelity volcano monitoring , 2009, MobiSys '09.

[18]  Eamonn J. Keogh,et al.  Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View That Includes Motifs, Discords and Shapelets , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[19]  Ashok Veeraraghavan,et al.  The Function Space of an Activity , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[20]  Eamonn J. Keogh,et al.  Probabilistic discovery of time series motifs , 2003, KDD '03.

[21]  Hui Ding,et al.  Querying and mining of time series data: experimental comparison of representations and distance measures , 2008, Proc. VLDB Endow..

[22]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[23]  Stefan Baisch,et al.  Spectral analysis with incomplete time series: an example from seismology , 1999 .