Online FDR Controlled Anomaly Detection for Streaming Time Series

As a large social network platform with more than 100 million daily active users, we have a large number of user engagement metrics stored in Google Cloud in the form of time series, detecting anomalies in such time series data in a robust fashion can give meaningful insights and enable proper subsequent actions. In this paper, we tackle this problem by transforming it into a multiple testing problem in the statistical domain. We first use STL (seasonal trend residual decomposition using Loess) to decompose the timeseries data, then we propose a novel empirical Bayes procedure for online False Discovery Rate (FDR) control at any nominal level on the residual terms. Our main contribution is the novel online FDR control procedure that’s robust and fits nicely with our streaming anomaly detection goal. Furthermore, our online FDR control procedure is a powerful statistical tool for many other anomaly detection algorithms since it can be directly applied on score functions or error terms to determine proper threshold, which are oftentimes empirically determined based on training data in the literature. R code for reproducing the results in the paper is provided in links hidden for double blind review.

[1]  Wenguang Sun,et al.  Multistage Adaptive Testing of Sparse Signals , 2017, 1707.07215.

[2]  Wenguang Sun,et al.  Oracle and Adaptive Compound Decision Rules for False Discovery Rate Control , 2007 .

[3]  M. Otto,et al.  Outliers in Time Series , 1972 .

[4]  Martin Valdez-Vivas,et al.  A Real-time Framework for Detecting Efficiency Regressions in a Globally Distributed Codebase , 2018, KDD.

[5]  Minrui Fei,et al.  An Anomaly Detection Approach Based on Isolation Forest Algorithm for Streaming Data Using Sliding Window , 2013, ICONS.

[6]  Heping Zhang,et al.  THE SCREENING AND RANKING ALGORITHM TO DETECT DNA COPY NUMBER VARIATIONS. , 2012, The annals of applied statistics.

[7]  Erick Giovani Sperandio Nascimento,et al.  A Cluster-based Algorithm for Anomaly Detection in Time Series Using Mahalanobis Distance , 2015 .

[8]  E. Candès,et al.  Controlling the false discovery rate via knockoffs , 2014, 1404.5609.

[9]  Dean P. Foster,et al.  α‐investing: a procedure for sequential control of expected false discoveries , 2008 .

[10]  Wenguang Sun,et al.  CARS: Covariate Assisted Ranking and Screening for Large-Scale Two-Sample Inference , 2018 .

[11]  Valentino Constantinou,et al.  Detecting Spacecraft Anomalies Using LSTMs and Nonparametric Dynamic Thresholding , 2018, KDD.

[12]  Michèle Basseville,et al.  Detection of abrupt changes: theory and application , 1993 .

[13]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[14]  Heping Zhang,et al.  Multiple Change-Point Detection via a Screening and Ranking Algorithm. , 2013, Statistica Sinica.

[15]  Witold Pedrycz,et al.  Anomaly detection in time series data using a fuzzy c-means clustering , 2013, 2013 Joint IFSA World Congress and NAFIPS Annual Meeting (IFSA/NAFIPS).

[16]  Leonardo Aguayo,et al.  Time Series Clustering for Anomaly Detection Using Competitive Neural Networks , 2009, WSOM.

[17]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[18]  T. Cai,et al.  Estimating the Null and the Proportion of Nonnull Effects in Large-Scale Multiple Comparisons , 2006, math/0611108.

[19]  Adel Javanmard,et al.  Online Rules for Control of False Discovery Rate and False Discovery Exceedance , 2016, ArXiv.

[20]  C. D. Kemp,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[21]  Nhien-An Le-Khac,et al.  Collective Anomaly Detection Based on Long Short-Term Memory Recurrent Neural Networks , 2016, FDSE.

[22]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[23]  Marius Kloft,et al.  Toward Supervised Anomaly Detection , 2014, J. Artif. Intell. Res..

[24]  Irma J. Terpenning,et al.  STL : A Seasonal-Trend Decomposition Procedure Based on Loess , 1990 .

[25]  Martin J. Wainwright,et al.  Online control of the false discovery rate with decaying memory , 2017, NIPS.

[26]  Zhi-Hua Zhou,et al.  Isolation Forest , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[27]  S. Rosset,et al.  Generalized α‐investing: definitions, optimality results and application to public databases , 2014 .

[28]  Subutai Ahmad,et al.  Unsupervised real-time anomaly detection for streaming data , 2017, Neurocomputing.

[29]  L. Wasserman,et al.  A stochastic process approach to false discovery control , 2004, math/0406519.

[30]  R. Weisberg A-N-D , 2011 .

[31]  Y. Benjamini,et al.  On the Adaptive Control of the False Discovery Rate in Multiple Testing With Independent Statistics , 2000 .

[32]  Lovekesh Vig,et al.  Long Short Term Memory Networks for Anomaly Detection in Time Series , 2015, ESANN.

[33]  Arun Kejariwal,et al.  Automatic Anomaly Detection in the Cloud Via Statistical Learning , 2017, ArXiv.

[34]  Subutai Ahmad,et al.  Real-Time Anomaly Detection for Streaming Analytics , 2016, ArXiv.

[35]  Maciej Szmit,et al.  Usage of Modified Holt-Winters Method in the Anomaly Detection of Network Traffic: Case Studies , 2012, J. Comput. Networks Commun..

[36]  Vic Barnett,et al.  Outliers in Statistical Data , 1980 .