论文信息 - Online FDR Controlled Anomaly Detection for Streaming Time Series

Online FDR Controlled Anomaly Detection for Streaming Time Series

As a large social network platform with more than 100 million daily active users, we have a large number of user engagement metrics stored in Google Cloud in the form of time series, detecting anomalies in such time series data in a robust fashion can give meaningful insights and enable proper subsequent actions. In this paper, we tackle this problem by transforming it into a multiple testing problem in the statistical domain. We first use STL (seasonal trend residual decomposition using Loess) to decompose the timeseries data, then we propose a novel empirical Bayes procedure for online False Discovery Rate (FDR) control at any nominal level on the residual terms. Our main contribution is the novel online FDR control procedure that’s robust and fits nicely with our streaming anomaly detection goal. Furthermore, our online FDR control procedure is a powerful statistical tool for many other anomaly detection algorithms since it can be directly applied on score functions or error terms to determine proper threshold, which are oftentimes empirically determined based on training data in the literature. R code for reproducing the results in the paper is provided in links hidden for double blind review.

Weinan Wang | Weinan Wang

[1] Wenguang Sun,et al. Multistage Adaptive Testing of Sparse Signals , 2017, 1707.07215.

[2] Wenguang Sun,et al. Oracle and Adaptive Compound Decision Rules for False Discovery Rate Control , 2007 .

[3] M. Otto,et al. Outliers in Time Series , 1972 .

[4] Martin Valdez-Vivas,et al. A Real-time Framework for Detecting Efficiency Regressions in a Globally Distributed Codebase , 2018, KDD.

[5] Minrui Fei,et al. An Anomaly Detection Approach Based on Isolation Forest Algorithm for Streaming Data Using Sliding Window , 2013, ICONS.

[6] Heping Zhang,et al. THE SCREENING AND RANKING ALGORITHM TO DETECT DNA COPY NUMBER VARIATIONS. , 2012, The annals of applied statistics.

[7] Erick Giovani Sperandio Nascimento,et al. A Cluster-based Algorithm for Anomaly Detection in Time Series Using Mahalanobis Distance , 2015 .

[8] E. Candès,et al. Controlling the false discovery rate via knockoffs , 2014, 1404.5609.

[9] Dean P. Foster,et al. α‐investing: a procedure for sequential control of expected false discoveries , 2008 .

[10] Wenguang Sun,et al. CARS: Covariate Assisted Ranking and Screening for Large-Scale Two-Sample Inference , 2018 .

[11] Valentino Constantinou,et al. Detecting Spacecraft Anomalies Using LSTMs and Nonparametric Dynamic Thresholding , 2018, KDD.