Real‐time Bayesian anomaly detection in streaming environmental data

[1] With large volumes of data arriving in near real time from environmental sensors, there is a need for automated detection of anomalous data caused by sensor or transmission errors or by infrequent system behaviors. This study develops and evaluates three automated anomaly detection methods using dynamic Bayesian networks (DBNs), which perform fast, incremental evaluation of data as they become available, scale to large quantities of data, and require no a priori information regarding process variables or types of anomalies that may be encountered. This study investigates these methods' abilities to identify anomalies in eight meteorological data streams from Corpus Christi, Texas. The results indicate that DBN-based detectors, using either robust Kalman filtering or Rao-Blackwellized particle filtering, outperform a DBN-based detector using Kalman filtering, with the former having false positive/negative rates of less than 2%. These methods were successful at identifying data anomalies caused by two real events: a sensor failure and a large storm.

[1]  David J. Hill,et al.  Data Mining Approches to Complex Environmental Problems , 2007 .

[2]  Simon J. Godsill,et al.  On sequential Monte Carlo sampling methods for Bayesian filtering , 2000, Stat. Comput..

[3]  Nando de Freitas,et al.  Rao-Blackwellised Particle Filtering for Dynamic Bayesian Networks , 2000, UAI.

[4]  M. Potkonjak,et al.  On-line fault detection of sensor measurements , 2003, Proceedings of IEEE Sensors 2003 (IEEE Cat. No.03CH37498).

[5]  Richard J. Meinhold,et al.  Robustification of Kalman Filter Models , 1989 .

[6]  R. Shumway,et al.  AN APPROACH TO TIME SERIES SMOOTHING AND FORECASTING USING THE EM ALGORITHM , 1982 .

[7]  B. Minsker,et al.  AUTOMATED FAULT DETECTION FOR IN-SITU ENVIRONMENTAL SENSORS , 2006 .

[8]  Gautam Biswas,et al.  Bayesian Fault Detection and Diagnosis in Dynamic Systems , 2000, AAAI/IAAI.

[9]  Gaurav S. Sukhatme,et al.  Fault detection and identification in a mobile robot using multiple model estimation and neural network , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[10]  Thomas G. Dietterich,et al.  Probabilistic Models for Anomaly Detection in Remote Sensor Data Streams , 2012, 1206.5250.

[11]  A. Goldsmith,et al.  Kalman filtering with partial observation losses , 2004, 2004 43rd IEEE Conference on Decision and Control (CDC) (IEEE Cat. No.04CH37601).

[12]  Uri Lerner,et al.  Hybrid Bayesian networks for reasoning about complex systems , 2002 .

[13]  Ann E. Nicholson,et al.  Dynamic Belief Networks for Discrete Monitoring , 1994, IEEE Trans. Syst. Man Cybern. Syst..

[14]  Mari Ostendorf,et al.  ML estimation of a stochastic linear system with the EM algorithm and its application to speech recognition , 1993, IEEE Trans. Speech Audio Process..

[15]  Eyal Amir,et al.  Real-time Bayesian Anomaly Detection for Environmental Sensor Data , 2007 .

[16]  Bradley Efron,et al.  HOW BROAD IS THE CLASS OF NORMAL SCALE MIXTURES , 1978 .

[17]  Don Coppersmith,et al.  Matrix multiplication via arithmetic progressions , 1987, STOC.

[18]  Uri Lerner,et al.  Inference in Hybrid Networks: Theoretical Limits and Practical Algorithms , 2001, UAI.

[19]  Xavier Boyen,et al.  Tractable Inference for Complex Stochastic Processes , 1998, UAI.

[20]  Hiroshi Motoda,et al.  Guest Editors' Introduction: Feature Transformation and Subset Selection. , 1998 .

[21]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[22]  N. Fisher,et al.  Efficient Simulation of the von Mises Distribution , 1979 .

[23]  R. Frühwirth,et al.  Track fitting with long-tailed noise: a Bayesian approach , 1995 .

[24]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[25]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[26]  Rick S. Blum,et al.  On the Approximation of Correlated Non-Gaussian Noise Pdfs using Gaussian Mixture Models , 1999 .

[27]  W. Krajewski,et al.  Real-time quality control of streamflow data―a simulation study , 1989 .

[28]  M Mourad,et al.  A method for automatic validation of long time series of data in urban hydrology. , 2002, Water science and technology : a journal of the International Association on Water Pollution Research.

[29]  G. Casella,et al.  Rao-Blackwellisation of sampling schemes , 1996 .

[30]  Xavier Boyen,et al.  Exploiting the Architecture of Dynamic Systems , 1999, AAAI/IAAI.

[31]  Geoffrey E. Hinton,et al.  Parameter estimation for linear dynamical systems , 1996 .

[32]  Huan Liu,et al.  IEEE Intelligent Systems , 2019, Computer.

[33]  S. Mitter,et al.  Robust Recursive Estimation in the Presence of Heavy-Tailed Observation Noise , 1994 .

[34]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.