Implications of Z-Normalization in the Matrix Profile

Companies are increasingly measuring their products and services, resulting in a rising amount of available time series data, making techniques to extract usable information needed. One state-of-the-art technique for time series is the Matrix Profile, which has been used for various applications including motif/discord discovery, visualizations and semantic segmentation. Internally, the Matrix Profile utilizes the z-normalized Euclidean distance to compare the shape of subsequences between two series. However, when comparing subsequences that are relatively flat and contain noise, the resulting distance is high despite the visual similarity of these subsequences. This property violates some of the assumptions made by Matrix Profile based techniques, resulting in worse performance when series contain flat and noisy subsequences. By studying the properties of the z-normalized Euclidean distance, we derived a method to eliminate this effect requiring only an estimate of the standard deviation of the noise. In this paper we describe various practical properties of the z-normalized Euclidean distance and show how these can be used to correct the performance of Matrix Profile related techniques. We demonstrate our techniques using anomaly detection using a Yahoo! Webscope anomaly dataset, semantic segmentation on the PAMAP2 activity dataset and for data visualization on a UCI activity dataset, all containing real-world data, and obtain overall better results after applying our technique. Our technique is a straightforward extension of the distance calculation in the Matrix Profile and will benefit any derived technique dealing with time series containing flat and noisy subsequences.

[1]  Reza Akbarinia,et al.  Efficient Matrix Profile Computation Using Different Distance Functions , 2019, ArXiv.

[2]  Eamonn J. Keogh,et al.  Exact Discovery of Time Series Motifs , 2009, SDM.

[3]  Eamonn J. Keogh,et al.  Matrix Profile V: A Generic Technique to Incorporate Domain Knowledge into Motif Discovery , 2017, KDD.

[4]  Jeffrey Scott Vitter,et al.  Random sampling with a reservoir , 1985, TOMS.

[5]  Diego Furtado Silva,et al.  Elastic Time Series Motifs and Discords , 2018, 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA).

[6]  Eamonn J. Keogh,et al.  Matrix Profile IV: Using Weakly Labeled Time Series to Predict Outcomes , 2017, Proc. VLDB Endow..

[7]  Eamonn J. Keogh,et al.  On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration , 2002, Data Mining and Knowledge Discovery.

[8]  Subutai Ahmad,et al.  Evaluating Real-Time Anomaly Detection Algorithms -- The Numenta Anomaly Benchmark , 2015, 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA).

[9]  Eamonn J. Keogh,et al.  Admissible Time Series Motif Discovery with Missing Data , 2018, ArXiv.

[10]  Sofie Van Hoecke,et al.  Eliminating Noise in the Matrix Profile , 2019, ICPRAM.

[11]  Yeh Chin-Chia Michael,et al.  Matrix Profile III: The Matrix Profile Allows Visualization of Salient Subsequences in Massive Time Series , 2016 .

[12]  Eamonn J. Keogh,et al.  Matrix Profile XII: MPdist: A Novel Time Series Distance Measure to Allow Data Mining in More Challenging Scenarios , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[13]  Eamonn J. Keogh,et al.  Matrix Profile X: VALMOD - Scalable Discovery of Variable-Length Motifs in Data Series , 2018, SIGMOD Conference.

[14]  Eamonn J. Keogh,et al.  Matrix Profile XIII: Time Series Snippets: A New Primitive for Time Series Data Mining , 2018, 2018 IEEE International Conference on Big Knowledge (ICBK).

[15]  Eamonn J. Keogh,et al.  Matrix Profile VIII: Domain Agnostic Online Semantic Segmentation at Superhuman Performance Levels , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[16]  Eamonn J. Keogh,et al.  Matrix Profile VII: Time Series Chains: A New Primitive for Time Series Data Mining (Best Student Paper Award) , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[17]  Eamonn J. Keogh,et al.  Matrix Profile II: Exploiting a Novel Algorithm and GPUs to Break the One Hundred Million Barrier for Time Series Motifs and Joins , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[18]  Jimeng Sun,et al.  Streaming Pattern Discovery in Multiple Time-Series , 2005, VLDB.

[19]  Xing Wang,et al.  A Self-Learning and Online Algorithm for Time Series Anomaly Detection, with Application in CPU Manufacturing , 2016, CIKM.

[20]  Eamonn J. Keogh,et al.  Matrix Profile XI: SCRIMP++: Time Series Motif Discovery at Interactive Speeds , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[21]  Didier Stricker,et al.  Introducing a New Benchmarked Dataset for Activity Monitoring , 2012, 2012 16th International Symposium on Wearable Computers.

[22]  Eamonn J. Keogh,et al.  Matrix Profile III: The Matrix Profile Allows Visualization of Salient Subsequences in Massive Time Series , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[23]  Filip De Turck,et al.  A generalized matrix profile framework with support for contextual series analysis , 2020, Eng. Appl. Artif. Intell..

[24]  Eamonn J. Keogh,et al.  Introducing time series chains: a new primitive for time series data mining , 2018, Knowledge and Information Systems.

[25]  Alberto O. Mendelzon,et al.  Similarity-based queries for time series data , 1997, SIGMOD '97.

[26]  Petia Radeva,et al.  Personalization and user verification in wearable systems using biometric walking patterns , 2011, Personal and Ubiquitous Computing.

[27]  Johannes Schöning,et al.  Theme issue on personal projection , 2011, Personal and Ubiquitous Computing.

[28]  Eamonn J. Keogh,et al.  Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View That Includes Motifs, Discords and Shapelets , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[29]  Eamonn J. Keogh,et al.  Matrix Profile VI: Meaningful Multidimensional Motif Discovery , 2017, 2017 IEEE International Conference on Data Mining (ICDM).