Root-cause Analysis for Time-series Anomalies via Spatiotemporal Graphical Modeling in Distributed Complex Systems

Performance monitoring, anomaly detection, and root-cause analysis in complex cyber-physical systems (CPSs) are often highly intractable due to widely diverse operational modes, disparate data types, and complex fault propagation mechanisms. This paper presents a new data-driven framework for root-cause analysis, based on a spatiotemporal graphical modeling approach built on the concept of symbolic dynamics for discovering and representing causal interactions among sub-systems of complex CPSs. We formulate the root-cause analysis problem as a minimization problem via the proposed inference based metric and present two approximate approaches for root-cause analysis, namely the sequential state switching ($S^3$, based on free energy concept of a restricted Boltzmann machine, RBM) and artificial anomaly association ($A^3$, a classification framework using deep neural networks, DNN). Synthetic data from cases with failed pattern(s) and anomalous node(s) are simulated to validate the proposed approaches. Real dataset based on Tennessee Eastman process (TEP) is also used for comparison with other approaches. The results show that: (1) $S^3$ and $A^3$ approaches can obtain high accuracy in root-cause analysis under both pattern-based and node-based fault scenarios, in addition to successfully handling multiple nominal operating modes, (2) the proposed tool-chain is shown to be scalable while maintaining high accuracy, and (3) the proposed framework is robust and adaptive in different fault conditions and performs better in comparison with the state-of-the-art methods.

[1]  Liviu Miclea,et al.  Cyber-Physical Systems - Concept, Challenges and Research Areas , 2012 .

[2]  Yang Xiang,et al.  Acquisition of Causal Models for Local Distributions in Bayesian Networks , 2014, IEEE Transactions on Cybernetics.

[3]  Mitsuhiro Kimura,et al.  A statistical dependent failure detection method for n-component parallel systems , 2017, Reliab. Eng. Syst. Saf..

[4]  Jie Yu,et al.  Identification of probabilistic graphical network model for root-cause diagnosis in industrial processes , 2014, Comput. Chem. Eng..

[5]  Francesco Palmieri,et al.  A distributed approach to network anomaly detection based on independent component analysis , 2014, Concurr. Comput. Pract. Exp..

[6]  Baskar Ganapathysubramanian,et al.  Hierarchical Feature Extraction for Efficient Design of Microfluidic Flow Patterns , 2015, FE@NIPS.

[7]  Chao Liu,et al.  Multivariate exploration of non-intrusive load monitoring via spatiotemporal pattern network , 2018 .

[8]  Yan Liu,et al.  Granger Causality for Time-Series Anomaly Detection , 2012, 2012 IEEE 12th International Conference on Data Mining.

[9]  Witold Pedrycz,et al.  Anomaly Detection and Characterization in Spatial Time Series Data: A Cluster-Centric Approach , 2014, IEEE Transactions on Fuzzy Systems.

[10]  Qi Wang,et al.  Online Anomaly Detection in Crowd Scenes via Structure Analysis , 2015, IEEE Transactions on Cybernetics.

[11]  William H. Press,et al.  Numerical Recipes: The Art of Scientific Computing , 1987 .

[12]  S. Joe Qin,et al.  Root cause diagnosis of plant-wide oscillations using Granger causality , 2014 .

[13]  Thomas S. Richardson,et al.  A Discovery Algorithm for Directed Cyclic Graphs , 1996, UAI.

[14]  Brendan J. Frey,et al.  A comparison of algorithms for inference and learning in probabilistic graphical models , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[16]  Illtyd Trethowan Causality , 1938 .

[17]  Bin Fang,et al.  A novel item anomaly detection approach against shilling attacks in collaborative recommendation systems using the dynamic time interval segmentation technique , 2015, Inf. Sci..

[18]  Soumik Sarkar,et al.  Probabilistic Graphical Modeling of Distributed Cyber-Physical Systems , 2017 .

[19]  Fernando José Von Zuben,et al.  Learning to Anticipate Flexible Choices in Multiple Criteria Decision-Making Under Uncertainty , 2016, IEEE Transactions on Cybernetics.

[20]  Jugal K. Kalita,et al.  A multi-step outlier-based anomaly detection approach to network-wide traffic , 2016, Inf. Sci..

[21]  Jose-Miguel Horcas,et al.  A goal-driven software product line approach for evolving multi-agent systems in the Internet of Things , 2019, Knowl. Based Syst..

[22]  Ishanu Chattopadhyay,et al.  Causality Networks , 2014, ArXiv.

[23]  Enrico Zio,et al.  Reliability assessment of systems subject to dependent degradation processes and random shocks , 2016 .

[24]  Bernhard Schölkopf,et al.  On Causal Discovery with Cyclic Additive Noise Models , 2011, NIPS.

[25]  Tianyou Chai,et al.  Dynamic time warping based causality analysis for root-cause diagnosis of nonstationary fault processes , 2015 .

[26]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[27]  Geoffrey E. Hinton,et al.  Learning to Represent Spatial Transformations with Factored Higher-Order Boltzmann Machines , 2010, Neural Computation.

[28]  Wenquan Feng,et al.  Knowledge distilling based model compression and feature learning in fault diagnosis , 2020, Appl. Soft Comput..

[29]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  E. F. Vogel,et al.  A plant-wide industrial process control problem , 1993 .

[31]  Xiao Hu,et al.  Multivariate change detection for time series data in aircraft engine fault diagnostics , 2007, 2007 IEEE International Conference on Systems, Man and Cybernetics.

[32]  Rainer Goebel,et al.  Investigating directed cortical interactions in time-resolved fMRI data using vector autoregressive modeling and Granger causality mapping. , 2003, Magnetic resonance imaging.

[33]  Edwin Lughofer,et al.  Fault detection in multi-sensor networks based on multivariate time-series models and orthogonal transformations , 2014, Inf. Fusion.

[34]  Patrik O. Hoyer,et al.  Discovering Cyclic Causal Models by Independent Components Analysis , 2008, UAI.

[35]  Jun Gao,et al.  Online Adaboost-Based Parameterized Methods for Dynamic Distributed Network Intrusion Detection , 2014, IEEE Transactions on Cybernetics.

[36]  Kai Liu,et al.  Adaptive fuzzy clustering based anomaly data detection in energy system of steel industry , 2014, Inf. Sci..

[37]  Abhishek Srivastav,et al.  A composite discretization scheme for symbolic identification of complex systems , 2016, Signal Process..

[38]  Douglas Eck,et al.  Learning Features from Music Audio with Deep Belief Networks , 2010, ISMIR.

[39]  Hiroki Takakura,et al.  Toward a more practical unsupervised anomaly detection system , 2013, Inf. Sci..

[40]  Xin Yao,et al.  Binarization With Boosting and Oversampling for Multiclass Classification , 2016, IEEE Transactions on Cybernetics.

[41]  Ping Zhang,et al.  A comparison study of basic data-driven fault diagnosis and process monitoring methods on the benchmark Tennessee Eastman process , 2012 .

[42]  Witold Pedrycz,et al.  Multivariate time series anomaly detection: A framework of Hidden Markov Models , 2017, Appl. Soft Comput..

[43]  Chao Liu,et al.  An unsupervised anomaly detection approach using energy-based spatiotemporal graphical modeling , 2017 .

[44]  Christopher Leckie,et al.  Online Clustering of Multivariate Time-series , 2016, SDM.

[45]  James H. Lambert,et al.  Multiscale identification of emergent and future conditions along corridors of transportation networks , 2017, Reliab. Eng. Syst. Saf..

[46]  Chao Liu,et al.  Energy prediction using spatiotemporal pattern networks , 2017 .

[47]  Naoya Takeishi,et al.  Anomaly detection from multivariate time-series with sparse representation , 2014, 2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[48]  Chao Liu,et al.  Bridge damage detection using spatiotemporal patterns extracted from dense sensor network , 2016 .

[49]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[50]  Chao Liu,et al.  Machine Condition Classification Using Deterioration Feature Extraction and Anomaly Determination , 2011, IEEE Transactions on Reliability.

[51]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[52]  Chao Liu,et al.  Data-driven root-cause analysis for distributed system anomalies , 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC).

[53]  Asok Ray,et al.  Sensor Fusion for Fault Detection and Classification in Distributed Physical Processes , 2014, Front. Robot. AI.

[54]  Amy Loutfi,et al.  A review of unsupervised feature learning and deep learning for time-series modeling , 2014, Pattern Recognit. Lett..

[55]  Xiaoli Li,et al.  Deep Convolutional Neural Networks on Multichannel Time Series for Human Activity Recognition , 2015, IJCAI.

[56]  Asok Ray,et al.  Review and comparative evaluation of symbolic dynamic filtering for detection of anomaly patterns , 2008, 2008 American Control Conference.

[57]  Ying Wah Teh,et al.  Time-series clustering - A decade review , 2015, Inf. Syst..

[58]  Munther A. Dahleh,et al.  Structure learning in causal cyclic networks , 2008 .

[59]  Chao Liu,et al.  An unsupervised spatiotemporal graphical modeling approach for wind turbine condition monitoring , 2018, Renewable Energy.

[60]  Frederick Eberhardt,et al.  Learning linear cyclic causal models with latent variables , 2012, J. Mach. Learn. Res..

[61]  Jukka Kortela,et al.  Fault propagation analysis of oscillations in control loops using data-driven causality and plant connectivity , 2014, Comput. Chem. Eng..

[62]  R. Scheines,et al.  Interventions and Causal Inference , 2007, Philosophy of Science.

[63]  Soumik Sarkar,et al.  Scalable Anomaly Detection and Isolation in Cyber-Physical Systems Using Bayesian Networks , 2014 .

[64]  Asok Ray,et al.  Fault detection and isolation in aircraft gas turbine engines. Part 1: Underlying concept , 2008 .

[65]  Christopher Leckie,et al.  High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning , 2016, Pattern Recognit..

[66]  Chao Liu,et al.  An Unsupervised Spatiotemporal Graphical Modeling Approach to Anomaly Detection in Distributed CPS , 2016, 2016 ACM/IEEE 7th International Conference on Cyber-Physical Systems (ICCPS).

[67]  Richard D. Braatz,et al.  Data-driven Methods for Fault Detection and Diagnosis in Chemical Processes , 2000 .