Exact variable-length anomaly detection algorithm for univariate and multivariate time series

The problem of anomaly detection in time series has received a lot of attention in the past two decades. However, existing techniques cannot locate where the anomalies are within anomalous time series, or they require users to provide the length of potential anomalies. To address these limitations, we propose a self-learning online anomaly detection algorithm that automatically identifies anomalous time series, as well as the exact locations where the anomalies occur in the detected time series. In addition, for multivariate time series, it is difficult to detect anomalies due to the following challenges. First, anomalies may occur in only a subset of dimensions (variables). Second, the locations and lengths of anomalous subsequences may be different in different dimensions. Third, some anomalies may look normal in each individual dimension but different with combinations of dimensions. To mitigate these problems, we introduce a multivariate anomaly detection algorithm which detects anomalies and identifies the dimensions and locations of the anomalous subsequences. We evaluate our approaches on several real-world datasets, including two CPU manufacturing data from Intel. We demonstrate that our approach can successfully detect the correct anomalies without requiring any prior knowledge about the data.

[1]  Li Wei,et al.  Experiencing SAX: a novel symbolic representation of time series , 2007, Data Mining and Knowledge Discovery.

[2]  F. Pukelsheim The Three Sigma Rule , 1994 .

[3]  Pang-Ning Tan,et al.  Detection and Characterization of Anomalies in Multivariate Time Series , 2009, SDM.

[4]  Spiros Papadimitriou,et al.  Computing Correlation Anomaly Scores Using Stochastic Nearest Neighbors , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[5]  Heng Wang,et al.  Locality Statistics for Anomaly Detection in Time Series of Graphs , 2013, IEEE Transactions on Signal Processing.

[6]  Tim Oates,et al.  Visualizing Variable-Length Time Series Motifs , 2012, SDM.

[7]  A. N. Srivastava,et al.  Anomaly Detection in Large Sets of High-Dimensional Symbol Sequences , 2006 .

[8]  Philip S. Yu,et al.  Under Consideration for Publication in Knowledge and Information Systems on Clustering Massive Text and Categorical Data Streams , 2022 .

[9]  Eamonn J. Keogh,et al.  Knowledge and Information Systems REGULAR , 2006 .

[10]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[11]  Francesco Battaglia,et al.  Outliers Detection in Multivariate Time Series by Independent Component Analysis , 2007, Neural Computation.

[12]  Eamonn J. Keogh,et al.  HOT SAX: efficiently finding the most unusual time series subsequence , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[13]  Dubravko Miljkovic,et al.  Fault detection methods: A literature survey , 2011, 2011 Proceedings of the 34th International Convention MIPRO.

[14]  Yan Liu,et al.  Granger Causality for Time-Series Anomaly Detection , 2012, 2012 IEEE 12th International Conference on Data Mining.

[15]  Rebecca Willett,et al.  Change-Point Detection for High-Dimensional Time Series With Missing Data , 2012, IEEE Journal of Selected Topics in Signal Processing.

[16]  Tim Oates,et al.  GrammarViz 2.0: A Tool for Grammar-Based Pattern Discovery in Time Series , 2014, ECML/PKDD.

[17]  Eamonn J. Keogh,et al.  Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases , 2001, Knowledge and Information Systems.

[18]  Saeed Amizadeh,et al.  Generic and Scalable Framework for Automated Time-series Anomaly Detection , 2015, KDD.

[19]  Li Wei,et al.  SAXually Explicit Images: Finding Unusual Shapes , 2006, Sixth International Conference on Data Mining (ICDM'06).

[20]  Yifeng Gao,et al.  A Machine Learning Approach to False Alarm Detection for Critical Arrhythmia Alarms , 2015, 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA).

[21]  Tim Oates,et al.  Time series anomaly discovery with grammar-based compression , 2015, EDBT.

[22]  Ian H. Witten,et al.  Identifying Hierarchical Structure in Sequences: A linear-time algorithm , 1997, J. Artif. Intell. Res..

[23]  Eamonn J. Keogh,et al.  Clustering of time-series subsequences is meaningless: implications for previous and future research , 2004, Knowledge and Information Systems.

[24]  Deng Cai,et al.  Unsupervised feature selection for multi-cluster data , 2010, KDD.

[25]  Mohammed J. Zaki,et al.  ADMIT: anomaly-based data mining for intrusions , 2002, KDD.

[26]  P. Jaccard THE DISTRIBUTION OF THE FLORA IN THE ALPINE ZONE.1 , 1912 .

[27]  R. Tsay,et al.  Outlier Detection in Multivariate Time Series by Projection Pursuit , 2006 .

[28]  Nirvana Meratnia,et al.  Outlier Detection Techniques for Wireless Sensor Networks: A Survey , 2008, IEEE Communications Surveys & Tutorials.

[29]  Varun Chandola,et al.  TR 09-004 Detecting Anomalies in a Time Series Database , 2009 .

[30]  Daling Wang,et al.  CD-Trees: An Efficient Index Structure for Outlier Detection , 2004, WAIM.

[31]  Charu C. Aggarwal,et al.  Outlier Detection for Temporal Data , 2014, Outlier Detection for Temporal Data.

[32]  Witold Pedrycz,et al.  Anomaly Detection and Characterization in Spatial Time Series Data: A Cluster-Centric Approach , 2014, IEEE Transactions on Fuzzy Systems.

[33]  Eamonn J. Keogh,et al.  Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy , 2015, KDD.

[34]  Xing Wang,et al.  A Self-Learning and Online Algorithm for Time Series Anomaly Detection, with Application in CPU Manufacturing , 2016, CIKM.

[35]  F. Gu,et al.  Fault detection and diagnosis using Principal Component Analysis of vibration data from a reciprocating compressor , 2012, Proceedings of 2012 UKACC International Conference on Control.

[36]  Rob J. Hyndman,et al.  Large-Scale Unusual Time Series Detection , 2015, 2015 IEEE International Conference on Data Mining Workshop (ICDMW).

[37]  Witold Pedrycz,et al.  Multivariate time series anomaly detection: A framework of Hidden Markov Models , 2017, Appl. Soft Comput..

[38]  Carla M. Santos-Pereira,et al.  Using Clustering and Robust Estimators to Detect Outliers in Multivariate Data. , 2005 .

[39]  Zengyou He,et al.  Discovering cluster-based local outliers , 2003, Pattern Recognit. Lett..