A linear time method for the detection of point and collective anomalies

The challenge of efficiently identifying anomalies in data sequences is an important statistical problem that now arises in many applications. Whilst there has been substantial work aimed at making statistical analyses robust to outliers, or point anomalies, there has been much less work on detecting anomalous segments, or collective anomalies. By bringing together ideas from changepoint detection and robust statistics, we introduce Collective And Point Anomalies (CAPA), a computationally efficient approach that is suitable when collective anomalies are characterised by either a change in mean, variance, or both, and distinguishes them from point anomalies. Theoretical results establish the consistency of CAPA at detecting collective anomalies and empirical results show that CAPA has close to linear computational cost as well as being more accurate at detecting and locating collective anomalies than other approaches. We demonstrate the utility of CAPA through its ability to detect exoplanets from light curve data from the Kepler telescope.

[1]  Hongzhe Li,et al.  Simultaneous Discovery of Rare and Common Segment Variants. , 2013, Biometrika.

[2]  J. Kline,et al.  The cusum test of homogeneity with an application in spontaneous abortion epidemiology. , 1985, Statistics in medicine.

[3]  Otto Struve,et al.  Proposal for a project of high-precision stellar radial velocity work , 1952 .

[4]  A. Munk,et al.  FDR-Control in Multiscale Change-point Segmentation , 2014, 1412.5844.

[5]  Jana Jurečková,et al.  Robust Statistical Methods with R , 2005 .

[6]  M. R. Haas,et al.  FALSE POSITIVE PROBABILITIES FOR ALL KEPLER OBJECTS OF INTEREST: 1284 NEWLY VALIDATED PLANETS AND 428 LIKELY FALSE POSITIVES , 2016, 1605.02825.

[7]  Peter J. Rousseeuw,et al.  ROBUST REGRESSION BY MEANS OF S-ESTIMATORS , 1984 .

[8]  Qiwei Yao,et al.  Tests for change-points with epidemic alternatives , 1993 .

[9]  David S. Matteson,et al.  Leveraging cloud data to mitigate user experience from ‘breaking bad’ , 2014, 2016 IEEE International Conference on Big Data (Big Data).

[10]  C. Yau,et al.  A pairwise likelihood-based approach for changepoint detection in multivariate time series models. , 2016, Biometrika.

[11]  R. Reiss Approximate Distributions of Order Statistics: With Applications to Nonparametric Statistics , 1989 .

[12]  Nora Muler,et al.  Robust estimates for arch processes , 2002 .

[13]  Piotr Fryzlewicz,et al.  Wild binary segmentation for multiple change-point detection , 2014, 1411.0858.

[14]  P. Massart,et al.  Adaptive estimation of a quadratic functional by model selection , 2000 .

[15]  Irma J. Terpenning,et al.  STL : A Seasonal-Trend Decomposition Procedure Based on Loess , 1990 .

[16]  Paul Fearnhead,et al.  Bayesian detection of abnormal segments in multiple time series , 2014 .

[17]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[18]  Padhraic Smyth,et al.  Markov monitoring with unknown states , 1994, IEEE J. Sel. Areas Commun..

[19]  P. Fearnhead,et al.  Optimal detection of changepoints with a linear computational cost , 2011, 1101.1438.

[20]  Sudipto Guha,et al.  Robust Random Cut Forest Based Anomaly Detection on Streams , 2016, ICML.

[21]  P. Sartoretti,et al.  On the detection of satellites of extrasolar planets with the method of transits , 1999 .

[22]  Yi-Ching Yao Estimating the number of change-points via Schwarz' criterion , 1988 .

[23]  Jeffrey D. Scargle,et al.  An algorithm for optimal partitioning of data on an interval , 2003, IEEE Signal Processing Letters.

[24]  Nora Muler,et al.  Robust estimates for GARCH models , 2008 .

[25]  L. Gordon,et al.  The Gamma Function , 1994, Series and Products in the Development of Mathematics.

[26]  J. Aston,et al.  Evaluating stationarity via change-point alternatives with applications to fMRI data , 2012, 1301.2894.

[27]  Idris A. Eckley,et al.  changepoint: An R Package for Changepoint Analysis , 2014 .

[28]  Susan E. Mullally Kepler Data Validation Time Series File: Description of File Format and Content , 2016 .

[29]  Nancy R. Zhang,et al.  Detecting simultaneous variant intervals in aligned sequences , 2011, 1108.3177.

[30]  P. Rousseeuw Multivariate estimation with high breakdown point , 1985 .

[31]  Nancy R. Zhang,et al.  Detecting simultaneous changepoints in multiple sequences. , 2010, Biometrika.

[32]  David A. Clifton,et al.  A review of novelty detection , 2014, Signal Process..

[33]  D. Ruppert Robust Statistics: The Approach Based on Influence Functions , 1987 .

[34]  Paul Fearnhead,et al.  Changepoint Detection in the Presence of Outliers , 2016, Journal of the American Statistical Association.

[35]  V. Yohai,et al.  Robust Estimation for ARMA models , 2009, 0904.0106.

[36]  Martin A. Lindquist,et al.  Change point estimation in multi-subject fMRI studies , 2010, NeuroImage.

[37]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.