AnomalyBench: An Open Benchmark for Explainable Anomaly Detection

Access to high-quality data repositories and benchmarks have been instrumental in advancing the state of the art in many domains, as they provide the research community a common ground for training, testing, evaluating, comparing, and experimenting with novel machine learning models. Lack of such community resources for anomaly detection (AD) severely limits progress. In this report, we present AnomalyBench, the first comprehensive benchmark for explainable AD over high-dimensional (2000+) time series data. AnomalyBench has been systematically constructed based on real data traces from ~100 repeated executions of 10 large-scale stream processing jobs on a Spark cluster. 30+ of these executions were disturbed by introducing ~100 instances of different types of anomalous events (e.g., misbehaving inputs, resource contention, process failures). For each of these anomaly instances, ground truth labels for the root-cause interval as well as those for the effect interval are available, providing a means for supporting both AD tasks and explanation discovery (ED) tasks via root-cause analysis. We demonstrate the key design features and practical utility of AnomalyBench through an experimental study with three state-of-the-art semi-supervised AD techniques.

[1]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[2]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Prashant J. Shenoy,et al.  SCALLA: A Platform for Scalable One-Pass Analytics Using MapReduce , 2012, TODS.

[4]  Dit-Yan Yeung,et al.  Deep Learning for Precipitation Nowcasting: A Benchmark and A New Model , 2017, NIPS.

[5]  Georg Langs,et al.  Unsupervised Anomaly Detection with Generative Adversarial Networks to Guide Marker Discovery , 2017, IPMI.

[6]  Lovekesh Vig,et al.  Long Short Term Memory Networks for Anomaly Detection in Time Series , 2015, ESANN.

[7]  Pasi Fränti,et al.  Outlier detection: how to threshold outlier scores? , 2019, AIIPCC '19.

[8]  Thomas G. Dietterich,et al.  Systematic construction of anomaly detection benchmarks from real data , 2013, ODD '13.

[9]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[10]  Ana Bianco,et al.  Outlier Detection in Regression Models with ARIMA Errors Using Robust Estimates , 2001 .

[11]  Stanley B. Zdonik,et al.  Precision and Recall for Time Series , 2018, NeurIPS.

[12]  Charu C. Aggarwal,et al.  Outlier Detection for Temporal Data: A Survey , 2014, IEEE Transactions on Knowledge and Data Engineering.

[13]  Michael Flynn,et al.  The UEA multivariate time series classification archive, 2018 , 2018, ArXiv.

[14]  Prabhat,et al.  ExtremeWeather: A large-scale climate dataset for semi-supervised detection, localization, and understanding of extreme weather events , 2016, NIPS.

[15]  Boris Katz,et al.  ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models , 2019, NeurIPS.

[16]  Martin Kopp,et al.  Anomaly explanation with random forests , 2020, Expert Syst. Appl..

[17]  Haopeng Zhang,et al.  EXstream: Explaining Anomalies in Event Stream Monitoring , 2017, EDBT.

[18]  Wiebke Schormann,et al.  The Cells Out of Sample (COOS) dataset and benchmarks for measuring out-of-sample generalization of image classifiers , 2019, NeurIPS.

[19]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[20]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[21]  Navindra Yadav,et al.  ExplainIt! -- A Declarative Root-cause Analysis Engine for Time Series Data , 2019, SIGMOD Conference.

[22]  Nidhi Singh,et al.  Demystifying Numenta anomaly benchmark , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[23]  Ruud C. M. de Rooij,et al.  Chaos Engineering , 2017, IEEE Software.

[24]  Shenglin Zhang,et al.  Diagnosing Root Causes of Intermittent Slow Queries in Large-Scale Cloud Databases. , 2020, VLDB 2020.

[25]  Yang Feng,et al.  Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications , 2018, WWW.

[26]  Omer Levy,et al.  SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.

[27]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[28]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[29]  Krista A. Ehinger,et al.  SUN Database: Exploring a Large Collection of Scene Categories , 2014, International Journal of Computer Vision.

[30]  Tim Kraska,et al.  VizNet: Towards A Large-Scale Visualization Learning and Benchmarking Repository , 2019, CHI.

[31]  Raghavendra Chalapathy University of Sydney,et al.  Deep Learning for Anomaly Detection: A Survey , 2019, ArXiv.

[32]  Sungzoon Cho,et al.  Variational Autoencoder based Anomaly Detection using Reconstruction Probability , 2015 .

[33]  See-Kiong Ng,et al.  Anomaly Detection with Generative Adversarial Networks for Multivariate Time Series , 2018, ArXiv.

[34]  Minh N. Do,et al.  STREETS: A Novel Camera Network Dataset for Traffic Flow , 2019, NeurIPS.

[35]  Wang Sun,et al.  Anomaly Detection and Explanation Discovery on Event Streams , 2018, BIRTE.

[36]  Mike Wu,et al.  Beyond Sparsity: Tree Regularization of Deep Models for Interpretability , 2017, AAAI.

[37]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[38]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[39]  Subutai Ahmad,et al.  Evaluating Real-Time Anomaly Detection Algorithms -- The Numenta Anomaly Benchmark , 2015, 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA).

[40]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[41]  Fei Tony Liu,et al.  Isolation-Based Anomaly Detection , 2012, TKDD.

[42]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[43]  Nhien-An Le-Khac,et al.  Collective Anomaly Detection Based on Long Short-Term Memory Recurrent Neural Networks , 2016, FDSE.

[44]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[45]  Samuel Madden,et al.  MacroBase: Prioritizing Attention in Fast Data , 2016, SIGMOD Conference.

[46]  Diederik P. Kingma,et al.  Stochastic Gradient VB and the Variational Auto-Encoder , 2013 .