Matrix Profile XIII: Time Series Snippets: A New Primitive for Time Series Data Mining

Perhaps the most basic query made by a data analyst confronting a new data source is "Show me some representative/typical data." Answering this question is trivial in many domains, but surprisingly, it is very difficult in large time series datasets. The major difficulty is not time or space complexity, but defining what it means to be representative data in this domain. In this work, we show that the obvious candidate definitions: motifs, shapelets, cluster centers, random samples etc., are all poor choices. Thus motivated, we introduce time series snippets, a novel representation of typical time series subsequences. Beyond their utility for visualizing and summarizing massive time series collections, we show that time series snippets have utility for high-level comparison of large time series collections.

[1]  Eamonn J. Keogh,et al.  CID: an efficient complexity-invariant distance for time series , 2013, Data Mining and Knowledge Discovery.

[2]  Yong Rui,et al.  Towards indexing representative images on the web , 2012, ACM Multimedia.

[3]  Roger G. Mark,et al.  Circulatory response to passive and active changes in posture , 2003, Computers in Cardiology, 2003.

[4]  Jörn Loviscach,et al.  Content-based icons for music files , 2008, Comput. Graph..

[5]  Didier Stricker,et al.  Introducing a New Benchmarked Dataset for Activity Monitoring , 2012, 2012 16th International Symposium on Wearable Computers.

[6]  Philip S. Yu,et al.  Optimal multi-scale patterns in time series streams , 2006, SIGMOD Conference.

[7]  Guillermo Sapiro,et al.  See all by looking at a few: Sparse modeling for finding representative objects , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  W. Brady,et al.  Electrocardiographic artefact mimicking arrhythmic change on the ECG , 2003, Emergency medicine journal : EMJ.

[9]  Lie Lu,et al.  Automated extraction of music snippets , 2003, ACM Multimedia.

[10]  Anthony K. H. Tung,et al.  Finding representative set from massive data , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[11]  Carol Forde-Johnston,et al.  Intentional rounding: a review of the literature. , 2014, Nursing standard (Royal College of Nursing (Great Britain) : 1987).

[12]  P E Di Prampero,et al.  Blood pressure and heart rate responses to sudden changes of gravity during exercise. , 1996, The American journal of physiology.

[13]  Danny C. Sorensen,et al.  Finding representative electrocardiogram beat morphologies with CUR , 2018, J. Biomed. Informatics.

[14]  Chris Mellish,et al.  Choosing the content of textual summaries of large time-series data sets , 2006, Natural Language Engineering.

[15]  Piotr Indyk,et al.  Identifying Representative Trends in Massive Time Series Data Sets Using Sketches , 2000, VLDB.

[16]  Hannu Toivonen,et al.  Finding Representative Nodes in Probabilistic Graphs , 2012, Bisociative Knowledge Discovery.

[17]  Eamonn J. Keogh,et al.  Time Series Classification to Improve Poultry Welfare , 2018, 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA).

[18]  Marko Salmenkivi Finding representative sets of dialect words for geographical regions , 2006, LREC.

[19]  Eamonn J. Keogh,et al.  Clustering of time-series subsequences is meaningless: implications for previous and future research , 2004, Knowledge and Information Systems.

[20]  Anatole Gershman,et al.  Topical Clustering of Tweets , 2011 .

[21]  Michelle Karg,et al.  Movement Primitive Segmentation for Human Motion Modeling: A Framework for Analysis , 2016, IEEE Transactions on Human-Machine Systems.

[22]  Eamonn J. Keogh,et al.  An ultra-fast time series distance measure to allow data mining in more complex real-world deployments , 2020, Data Mining and Knowledge Discovery.

[23]  T. D. Schneider,et al.  Consensus sequence Zen. , 2002, Applied bioinformatics.

[24]  F. Drews Patient Monitors in Critical Care: Lessons for Improvement , 2008 .

[25]  Michael E. Webber,et al.  Clustering analysis of residential electricity demand profiles , 2014 .

[26]  Eamonn J. Keogh,et al.  Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View That Includes Motifs, Discords and Shapelets , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[27]  Vicente Moret Bonillo,et al.  Computer-Assisted Diagnosis of the Sleep Apnea-Hypopnea Syndrome: A Review , 2015 .

[28]  Eamonn J. Keogh,et al.  Matrix Profile II: Exploiting a Novel Algorithm and GPUs to Break the One Hundred Million Barrier for Time Series Motifs and Joins , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).