Deep Multi-Instance Contrastive Learning with Dual Attention for Anomaly Precursor Detection

Prognostics or early detection of incipient faults by leveraging the monitoring time series data in complex systems is valuable to automatic system management and predictive maintenance. However, this task is challenging. First, learning the multi-dimensional heterogeneous time series data with various anomaly types is hard. Second, the precise annotation of anomaly incipient periods is lacking. Third, the interpretable tools to diagnose the precursor symptoms are lacking. Despite some recent progresses, few of the existing approaches can jointly resolve these challenges. In this paper, we propose MCDA, a deep multi-instance contrastive learning approach with dual attention, to detect anomaly precursor. MCDA utilizes multi-instance learning to model the uncertainty of precursor period, and employs recurrent neural network with tensorized hidden states to extract precursor features encoded in temporal dynamics as well as the correlations between different pairs of time series. A dual attention mechanism on both temporal aspect and time series variables is developed to pinpoint the time period and the sensors the precursor symptoms are involved in. A contrastive loss is designed to address the issue that annotated anomalies are few. To the best of our knowledge, MCDA is the first method studying the problem of ‘when’ and ‘where’ for the anomaly precursor detection simultaneously. Extensive experiments on both synthetic and real datasets demonstrate the effectiveness of MCDA.

[1]  Maria A. Zuluaga,et al.  USAD: UnSupervised Anomaly Detection on Multivariate Time Series , 2020, KDD.

[2]  Bo Zong,et al.  Deep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detection , 2018, ICLR.

[3]  Wei Cheng,et al.  Ranking Causal Anomalies via Temporal and Dynamical Analysis on Vanishing Correlations , 2016, KDD.

[4]  Xiaowei Jia,et al.  Semi-supervised Classification using Attention-based Regularization on Coarse-resolution Data , 2020, SDM.

[5]  Vijay Manikandan Janakiraman,et al.  Explaining Aviation Safety Incidents Using Deep Temporal Multiple Instance Learning , 2017, KDD.

[6]  Nino Antulov-Fantulin,et al.  Exploring Interpretable LSTM Neural Networks over Multi-Variable Data , 2019, ICML.

[7]  Jakub M. Tomczak,et al.  Deep multiple instance learning for digital histopathology , 2020 .

[8]  Liang Xiao,et al.  Wider and Deeper, Cheaper and Faster: Tensorized LSTMs for Sequence Learning , 2017, NIPS.

[9]  Marco Cuturi,et al.  Fast Global Alignment Kernels , 2011, ICML.

[10]  Jay Lee,et al.  Watchdog Agent - an infotronics-based prognostics approach for product performance degradation assessment and prediction , 2003, Adv. Eng. Informatics.

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  Naren Ramakrishnan,et al.  Modeling Precursors for Event Forecasting via Nested Multi-Instance Learning , 2016, KDD.

[13]  Stephen P. Boyd,et al.  Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data , 2017, KDD.

[14]  Yu Cheng,et al.  Deep Structured Energy Based Models for Anomaly Detection , 2016, ICML.

[15]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[16]  Charu C. Aggarwal,et al.  NetWalk: A Flexible Deep Embedding Approach for Anomaly Detection in Dynamic Networks , 2018, KDD.

[17]  Nadine B. Sarter,et al.  Error Types and Related Error Detection Mechanisms in the Aviation Domain: An Analysis of Aviation Safety Reporting System Incident Reports , 2000 .

[18]  Haifeng Chen,et al.  Exploiting Local and Global Invariants for the Management of Large Scale Information Systems , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[19]  Bo Zong,et al.  A Deep Neural Network for Unsupervised Anomaly Detection and Diagnosis in Multivariate Time Series Data , 2018, AAAI.

[20]  Li Wei,et al.  Experiencing SAX: a novel symbolic representation of time series , 2007, Data Mining and Knowledge Discovery.

[21]  Michael Hahn,et al.  Theoretical Limitations of Self-Attention in Neural Sequence Models , 2019, TACL.

[22]  Garrison W. Cottrell,et al.  A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction , 2017, IJCAI.

[23]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[24]  Thomas Hofmann,et al.  Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.

[25]  Jaume Amores,et al.  Multiple instance classification: Review, taxonomy and comparative study , 2013, Artif. Intell..

[26]  Naren Ramakrishnan,et al.  STAPLE: Spatio-Temporal Precursor Learning for Event Forecasting , 2018, SDM.

[27]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[28]  Lovekesh Vig,et al.  LSTM-based Encoder-Decoder for Multi-sensor Anomaly Detection , 2016, ArXiv.