A Cluster-based Algorithm for Anomaly Detection in Time Series Using Mahalanobis Distance

We propose an unsupervised learning algorithm for anomaly detection in time series data, based on clustering techniques, using the Mahalanobis distance function. After a brief review of the main and recent contributions made in this research field, a formal and detailed description of the algorithm is presented, followed by a discussion on how to set its parameters. In order to evaluate its effectiveness, it was applied to a real case, and its results were compared with another technique that targets the same problem. The obtained results suggest that this proposal can be successfully applied to detect anomaly in time series.

[1]  Pang-Ning Tan,et al.  Detection and Characterization of Anomalies in Multivariate Time Series , 2009, SDM.

[2]  Julius T. Tou,et al.  Pattern Recognition Principles , 1974 .

[3]  V. Kavitha,et al.  Clustering Time Series Data Stream - A Literature Survey , 2010, ArXiv.

[4]  Pavlos Protopapas,et al.  Finding Anomalous Periodic Time Series: An Application to Catalogs of Periodic Variable Stars , 2009, arXiv.org.

[5]  William Perrizo,et al.  RDF: a density-based outlier detection method using vertical data representation , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[6]  Cyrus Shahabi,et al.  TSA-tree: a wavelet-based approach to improve the efficiency of multi-level surprise and trend queries on time-series data , 2000, Proceedings. 12th International Conference on Scientific and Statistica Database Management.

[7]  Eamonn J. Keogh,et al.  HOT SAX: efficiently finding the most unusual time series subsequence , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[8]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[9]  Eamonn J. Keogh,et al.  VizTree: a Tool for Visually Mining and Monitoring Massive Time Series Databases , 2004, VLDB.

[10]  Dipankar Dasgupta,et al.  Novelty detection in time series data using ideas from immunology , 1996 .

[11]  Meena Mahajan,et al.  The Planar k-means Problem is NP-hard I , 2009 .

[12]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[13]  Eamonn J. Keogh,et al.  Finding surprising patterns in a time series database in linear time and space , 2002, KDD.

[14]  Chris Jermaine,et al.  Outlier detection by sampling with accuracy guarantees , 2006, KDD '06.

[15]  Li Wei,et al.  Assumption-Free Anomaly Detection in Time Series , 2005, SSDBM.

[16]  Philip S. Yu,et al.  Infominer: mining surprising periodic patterns , 2001, KDD '01.

[17]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[18]  Philip S. Yu,et al.  Mining Surprising Periodic Patterns , 2004, Data Mining and Knowledge Discovery.

[19]  Junshui Ma,et al.  Online novelty detection on temporal sequences , 2003, KDD '03.

[20]  Anthony K. H. Tung,et al.  Mining top-n local outliers in large databases , 2001, KDD '01.

[21]  Clara Pizzuti,et al.  Fast Outlier Detection in High Dimensional Spaces , 2002, PKDD.

[22]  David L. Olson,et al.  Advanced Data Mining Techniques , 2008 .