A Self-Learning and Online Algorithm for Time Series Anomaly Detection, with Application in CPU Manufacturing

The problem of anomaly detection in time series has received a lot of attention in the past two decades. However, existing techniques cannot locate where the anomalies are within anomalous time series, or they require users to provide the length of potential anomalies. To address these limitations, we propose a self-learning online anomaly detection algorithm that automatically identifies anomalous time series, as well as the exact locations where the anomalies occur in the detected time series. We evaluate our approach on several real datasets, including two CPU manufacturing data from Intel. We demonstrate that our approach can successfully detect the correct anomalies without requiring any prior knowledge about the data.

[1]  Heng Wang,et al.  Locality Statistics for Anomaly Detection in Time Series of Graphs , 2013, IEEE Transactions on Signal Processing.

[2]  A. N. Srivastava,et al.  Anomaly Detection in Large Sets of High-Dimensional Symbol Sequences , 2006 .

[3]  Rob J. Hyndman,et al.  Large-Scale Unusual Time Series Detection , 2015, 2015 IEEE International Conference on Data Mining Workshop (ICDMW).

[4]  Charu C. Aggarwal,et al.  Outlier Detection for Temporal Data , 2014, Outlier Detection for Temporal Data.

[5]  Witold Pedrycz,et al.  Anomaly Detection and Characterization in Spatial Time Series Data: A Cluster-Centric Approach , 2014, IEEE Transactions on Fuzzy Systems.

[6]  Li Wei,et al.  Experiencing SAX: a novel symbolic representation of time series , 2007, Data Mining and Knowledge Discovery.

[7]  Rebecca Willett,et al.  Change-Point Detection for High-Dimensional Time Series With Missing Data , 2012, IEEE Journal of Selected Topics in Signal Processing.

[8]  Tim Oates,et al.  Time series anomaly discovery with grammar-based compression , 2015, EDBT.

[9]  Tim Oates,et al.  GrammarViz 2.0: A Tool for Grammar-Based Pattern Discovery in Time Series , 2014, ECML/PKDD.

[10]  Varun Chandola,et al.  TR 09-004 Detecting Anomalies in a Time Series Database , 2009 .

[11]  Daling Wang,et al.  CD-Trees: An Efficient Index Structure for Outlier Detection , 2004, WAIM.

[12]  Eamonn J. Keogh,et al.  Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases , 2001, Knowledge and Information Systems.

[13]  Carla M. Santos-Pereira,et al.  Using Clustering and Robust Estimators to Detect Outliers in Multivariate Data. , 2005 .

[14]  Zengyou He,et al.  Discovering cluster-based local outliers , 2003, Pattern Recognit. Lett..

[15]  Eamonn J. Keogh,et al.  HOT SAX: efficiently finding the most unusual time series subsequence , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[16]  Eamonn J. Keogh,et al.  Knowledge and Information Systems REGULAR , 2006 .

[17]  Saeed Amizadeh,et al.  Generic and Scalable Framework for Automated Time-series Anomaly Detection , 2015, KDD.

[18]  Ian H. Witten,et al.  Identifying Hierarchical Structure in Sequences: A linear-time algorithm , 1997, J. Artif. Intell. Res..

[19]  Eamonn J. Keogh,et al.  Locally adaptive dimensionality reduction for indexing large time series databases , 2001, SIGMOD '01.

[20]  Tim Oates,et al.  Visualizing Variable-Length Time Series Motifs , 2012, SDM.

[21]  Eamonn J. Keogh,et al.  Clustering of time-series subsequences is meaningless: implications for previous and future research , 2004, Knowledge and Information Systems.

[22]  Nirvana Meratnia,et al.  Outlier Detection Techniques for Wireless Sensor Networks: A Survey , 2008, IEEE Communications Surveys & Tutorials.

[23]  Mohammed J. Zaki,et al.  ADMIT: anomaly-based data mining for intrusions , 2002, KDD.

[24]  Li Wei,et al.  SAXually Explicit Images: Finding Unusual Shapes , 2006, Sixth International Conference on Data Mining (ICDM'06).

[25]  Yifeng Gao,et al.  A Machine Learning Approach to False Alarm Detection for Critical Arrhythmia Alarms , 2015, 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA).

[26]  A. Madansky Identification of Outliers , 1988 .

[27]  Eamonn J. Keogh,et al.  Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy , 2015, KDD.

[28]  Philip S. Yu,et al.  Under Consideration for Publication in Knowledge and Information Systems on Clustering Massive Text and Categorical Data Streams , 2022 .

[29]  P. Jaccard THE DISTRIBUTION OF THE FLORA IN THE ALPINE ZONE.1 , 1912 .

[30]  F. Pukelsheim The Three Sigma Rule , 1994 .