Matrix Profile XXI: A Geometric Approach to Time Series Chains Improves Robustness

Time series motifs have become a fundamental tool to characterize repeated and conserved structure in systems, such as manufacturing telemetry, economic activities, and both human physiological and cultural behaviors. Recently time series chains were introduced as a generalization of time series motifs to represent evolving patterns in time series, in order to characterize the evolution of systems. Time series chains are a very promising primitive; however, we have observed that the original definition can be brittle in the sense that a small fluctuation in time series may "cut" a chain. Furthermore, the original definition does not provide a measure of the "significance" of a chain, and therefore cannot support top-k search for chains or provide a mechanism to discard spurious chains that might be discovered when searching large datasets. Inspired by observations from dynamical systems theory, this paper introduces two novel quality metrics for time series chains, directionality and graduality, to improve robustness and to enable top-K search. With extensive empirical work we show that our proposed definition is much more robust to the vagaries of real-word datasets and allows us to find unexpected regularities in time series datasets.

[1]  Roger G. Mark,et al.  Circulatory response to passive and active changes in posture , 2003, Computers in Cardiology, 2003.

[2]  Eamonn J. Keogh,et al.  Matrix Profile XX: Finding and Visualizing Time Series Motifs of All Lengths using the Matrix Profile , 2019, 2019 IEEE International Conference on Big Knowledge (ICBK).

[3]  Piotr Indyk,et al.  Motif discovery in physiological datasets: A methodology for inferring predictive elements , 2010, TKDD.

[4]  Christos Faloutsos,et al.  Graph evolution: Densification and shrinking diameters , 2006, TKDD.

[5]  Steven K. Firth,et al.  A data management platform for personalised real-time energy feedback , 2015 .

[6]  Michael F. Wilson,et al.  Blood Pressure Response to Caffeine Shows Incomplete Tolerance After Short-Term Regular Consumption , 2004, Hypertension.

[7]  Eamonn J. Keogh,et al.  Matrix Profile II: Exploiting a Novel Algorithm and GPUs to Break the One Hundred Million Barrier for Time Series Motifs and Joins , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[8]  Jilles Vreeken,et al.  Keeping it Short and Simple: Summarising Complex Event Sequences with Multivariate Patterns , 2015, KDD.

[9]  Hua Li,et al.  Discovering All-Chain Set in Streaming Time Series , 2019, PAKDD.

[10]  M. Hirsch,et al.  Differential Equations, Dynamical Systems, and an Introduction to Chaos , 2003 .

[11]  Eamonn J. Keogh,et al.  Matrix Profile VII: Time Series Chains: A New Primitive for Time Series Data Mining (Best Student Paper Award) , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[12]  Eamonn J. Keogh,et al.  Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View That Includes Motifs, Discords and Shapelets , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[13]  Alex Pentland,et al.  The predictability of consumer visitation patterns , 2010, Scientific Reports.

[14]  Hui Ding,et al.  Querying and mining of time series data: experimental comparison of representations and distance measures , 2008, Proc. VLDB Endow..

[15]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[16]  Eamonn J. Keogh,et al.  Time series joins, motifs, discords and shapelets: a unifying view that exploits the matrix profile , 2017, Data Mining and Knowledge Discovery.

[17]  Michael Gertz,et al.  Time will Tell: Temporal Linking of News Stories , 2015, JCDL.

[18]  Eamonn J. Keogh,et al.  Introducing time series chains: a new primitive for time series data mining , 2019, Knowledge and Information Systems.