Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy

Unsupervisedly detecting anomaly points in time series is challenging, which requires the model to learn informative representations and derive a distinguishable criterion. Prior methods mainly detect anomalies based on the recurrent network representation of each time point. However, the point-wise representation is less informative for complex temporal patterns and can be dominated by normal patterns, making rare anomalies less distinguishable. We find that in each time series, each time point can also be described by its associations with all time points, presenting as a point-wise distribution that is more expressive for temporal modeling. We further observe that due to the rarity of anomalies, it is harder for anomalies to build strong associations with the whole series and their associations shall mainly concentrate on the adjacent time points. This observation implies an inherently distinguishable criterion between normal and abnormal points, which we highlight as the Association Discrepancy. Technically we propose the Anomaly Transformer with an Anomaly-Attention mechanism to compute the association discrepancy. A minimax strategy is devised to amplify the normal-abnormal distinguishability of the association discrepancy. Anomaly Transformer achieves state-of-the-art performance on six unsupervised time series anomaly detection benchmarks for three applications: service monitoring, space & earth exploration, and water treatment.

[1]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[2]  Jian Tang,et al.  Enhancing Effectiveness of Outlier Detections for Low Density Patterns , 2002, PAKDD.

[3]  Douglas Eck,et al.  Music Transformer , 2018, 1809.04281.

[4]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[5]  Charles C. Kemp,et al.  A Multimodal Anomaly Detector for Robot-Assisted Feeding Using an LSTM-Based Variational Autoencoder , 2017, IEEE Robotics and Automation Letters.

[6]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[7]  Bin Zhou,et al.  BeatGAN: Anomalous Rhythm Detection using Adversarially Generated Time Series , 2019, IJCAI.

[8]  Stephen Lin,et al.  Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[9]  Bo Zong,et al.  Deep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detection , 2018, ICLR.

[10]  Chandan Srivastava,et al.  Support Vector Data Description , 2011 .

[11]  Yang Feng,et al.  Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications , 2018, WWW.

[12]  Zhuanghua Liu,et al.  Practical Approach to Asynchronous Multivariate Time Series Anomaly Detection and Localization , 2021, KDD.

[13]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14]  Dan Pei,et al.  Multivariate Time Series Anomaly Detection and Interpretation using Hierarchical Inter-Metric and Temporal Embedding , 2021, KDD.

[15]  Yue Zhao,et al.  Revisiting Time Series Outlier Detection: Definitions and Benchmarks , 2021, NeurIPS Datasets and Benchmarks.

[16]  Georg Heigold,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.

[17]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[18]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[19]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[20]  Lukasz Kaiser,et al.  Reformer: The Efficient Transformer , 2020, ICLR.

[21]  Wei Sun,et al.  Robust Anomaly Detection for Multivariate Time Series through Stochastic Recurrent Neural Network , 2019, KDD.

[22]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[23]  Hui Xiong,et al.  Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting , 2020, AAAI.

[24]  Alexander Binder,et al.  Deep One-Class Classification , 2018, ICML.

[25]  Lifeng Shen,et al.  Timeseries Anomaly Detection using Temporal Hierarchical One-Class Network , 2020, NeurIPS.

[26]  Jianmin Wang,et al.  Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting , 2021, NeurIPS.

[27]  Georg Langs,et al.  f‐AnoGAN: Fast unsupervised anomaly detection with generative adversarial networks , 2019, Medical Image Anal..

[28]  Xiuzhen Cheng,et al.  Learning Graph Structures With Transformer for Multivariate Time-Series Anomaly Detection in IoT , 2021, IEEE Internet of Things Journal.

[29]  Lei Shi,et al.  MAD-GAN: Multivariate Anomaly Detection for Time Series Data with Generative Adversarial Networks , 2019, ICANN.

[30]  Wenhu Chen,et al.  Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting , 2019, NeurIPS.

[31]  Nils Ole Tippenhauer,et al.  SWaT: a water treatment testbed for research and training on ICS security , 2016, 2016 International Workshop on Cyber-physical Systems for Smart Water Networks (CySWater).