Introducing time series chains: a new primitive for time series data mining

AbstractTime series motifs were introduced in 2002 and have since become a fundamental tool for time series analytics, finding diverse uses in dozens of domains. In this work, we introduce Time Series Chains, which are related to, but distinct from, time series motifs. Informally, time series chains are a temporally ordered set of subsequence patterns, such that each pattern is similar to the pattern that preceded it, but the first and last patterns can be arbitrarily dissimilar. In the discrete space, this is similar to extracting the text chain “data, date, cate, cade, code” from text stream. The first and last words have nothing in common, yet they are connected by a chain of words with a small mutual difference. Time series chains can capture the evolution of systems, and help predict the future. As such, they potentially have implications for prognostics. In this work, we introduce two robust definitions of time series chains and scalable algorithms that allow us to discover them in massive complex datasets.

[1]  Jiawei Han,et al.  Mining periodic behaviors of object movements for animal and biological sustainability studies , 2011, Data Mining and Knowledge Discovery.

[2]  Nate Silver,et al.  The Signal and the Noise: The Art and Science of Prediction , 2012 .

[3]  Alex Pentland,et al.  The predictability of consumer visitation patterns , 2010, Scientific Reports.

[4]  Hui Ding,et al.  Querying and mining of time series data: experimental comparison of representations and distance measures , 2008, Proc. VLDB Endow..

[5]  Shwetak N. Patel,et al.  ElectriSense: single-point sensing using EMI for electrical event detection and classification in the home , 2010, UbiComp.

[6]  Christos Faloutsos,et al.  The Web as a Jungle: Non-Linear Dynamical Systems for Co-evolving Online Activities , 2015, WWW.

[7]  Michael Gertz,et al.  Time will Tell: Temporal Linking of News Stories , 2015, JCDL.

[8]  Piotr Indyk,et al.  Motif discovery in physiological datasets: A methodology for inferring predictive elements , 2010, TKDD.

[9]  Yan Zhang,et al.  Evolutionary timeline summarization: a balanced optimization framework via iterative substitution , 2011, SIGIR.

[10]  Thuc Dinh Nguyen,et al.  On the instability of sensor orientation in gait verification on mobile phone , 2015, 2015 12th International Joint Conference on e-Business and Telecommunications (ICETE).

[11]  Manish Marwah,et al.  Visual exploration of frequent patterns in multivariate time series , 2012, Inf. Vis..

[12]  P. Ponganis,et al.  Penguin lungs and air sacs: implications for baroprotection, oxygen stores and buoyancy , 2015, The Journal of Experimental Biology.

[13]  Eamonn J. Keogh,et al.  Mining motifs in massive time series databases , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[14]  Cassondra L. Williams,et al.  Muscle Energy Stores and Stroke Rates of Emperor Penguins: Implications for Muscle Metabolism and Dive Performance , 2012, Physiological and Biochemical Zoology.

[15]  Eamonn J. Keogh,et al.  Matrix Profile II: Exploiting a Novel Algorithm and GPUs to Break the One Hundred Million Barrier for Time Series Motifs and Joins , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[16]  Tim Oates,et al.  Finding story chains in newswire articles , 2012, 2012 IEEE 13th International Conference on Information Reuse & Integration (IRI).

[17]  Roger G. Mark,et al.  Circulatory response to passive and active changes in posture , 2003, Computers in Cardiology, 2003.

[18]  Angel Moya,et al.  Tilt testing and neurally mediated syncope: too many protocols for one condition or specific protocols for different situations? , 2009, European heart journal.

[19]  Jilles Vreeken,et al.  Keeping it Short and Simple: Summarising Complex Event Sequences with Multivariate Patterns , 2015, KDD.

[20]  Eamonn J. Keogh,et al.  Discovery of Meaningful Rules in Time Series , 2015, KDD.

[21]  Eamonn J. Keogh,et al.  Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View That Includes Motifs, Discords and Shapelets , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[22]  Steven K. Firth,et al.  A data management platform for personalised real-time energy feedback , 2015 .

[23]  Michael F. Wilson,et al.  Blood Pressure Response to Caffeine Shows Incomplete Tolerance After Short-Term Regular Consumption , 2004, Hypertension.

[24]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..