Anomaly detection in streaming environmental sensor data: A data-driven modeling approach

The deployment of environmental sensors has generated an interest in real-time applications of the data they collect. This research develops a real-time anomaly detection method for environmental data streams that can be used to identify data that deviate from historical patterns. The method is based on an autoregressive data-driven model of the data stream and its corresponding prediction interval. It performs fast, incremental evaluation of data as it becomes available, scales to large quantities of data, and requires no pre-classification of anomalies. Furthermore, this method can be easily deployed on a large heterogeneous sensor network. Sixteen instantiations of this method are compared based on their ability to identify measurement errors in a windspeed data stream from Corpus Christi, Texas. The results indicate that a multilayer perceptron model of the data stream, coupled with replacement of anomalous data points, performs well at identifying erroneous data in this data stream.

[1]  D. T. Kaplan,et al.  Model-independent technique for determining the embedding dimension , 1993, Optics & Photonics.

[2]  M Mourad,et al.  A method for automatic validation of long time series of data in urban hydrology. , 2002, Water science and technology : a journal of the International Association on Water Pollution Research.

[3]  Belle R. Upadhyaya,et al.  Sensor fault analysis using decision theory and data-driven modeling of pressurized water reactor subsystems , 1984 .

[4]  Misganaw Demissie,et al.  Hydrologic applications of MRAN algorithm , 2007 .

[5]  D. Hand,et al.  Unsupervised Profiling Methods for Fraud Detection , 2002 .

[6]  David West,et al.  Predictive modeling for wastewater applications: Linear and nonlinear approaches , 2009, Environ. Model. Softw..

[7]  Ambuj K. Singh,et al.  Real-time nondestructive structural health monitoring using support vector machines and wavelets , 2005, SPIE Smart Structures and Materials + Nondestructive Evaluation and Health Monitoring.

[8]  Gerrit Hoogenboom,et al.  Weather analogue: A tool for real-time prediction of daily weather data realizations based on a modified k-nearest neighbor approach , 2008, Environ. Model. Softw..

[9]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[10]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[11]  P. Arzberger,et al.  Sensors for Environmental Observatories , 2005 .

[12]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[13]  Jiawei Han,et al.  Data Mining: Concepts and Techniques, Second Edition , 2006, The Morgan Kaufmann series in data management systems.

[14]  David G. Stork,et al.  Pattern Classification , 1973 .

[15]  Belle R. Upadhyaya,et al.  Multivariate statistical signal processing technique for fault detection and diagnostics , 1990 .

[16]  Gwilym M. Jenkins,et al.  Time series analysis, forecasting and control , 1971 .

[17]  Miquel Sànchez-Marrè,et al.  Nearest-Neighbours for Time Series , 2004, Applied Intelligence.

[18]  Albert Carles Bifet Figuerol,et al.  Adaptive parameter-free learning from evolving data streams , 2009 .

[19]  M. Potkonjak,et al.  On-line fault detection of sensor measurements , 2003, Proceedings of IEEE Sensors 2003 (IEEE Cat. No.03CH37498).

[20]  Paolo F. Fantoni,et al.  Multiple-Failure Signal Validation in Nuclear Power Plants Using Artificial Neural Networks , 1996 .

[21]  James S. Bonner,et al.  Sensing the coastal environment , 2003 .

[22]  Mario Innocenti,et al.  Fault detection using neural networks , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[23]  George H. John Robust Decision Trees: Removing Outliers from Databases , 1995, KDD.

[24]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[25]  Richard A. Davis,et al.  Introduction to time series and forecasting , 1998 .

[26]  Michael K. Evans,et al.  Practical Business Forecasting , 2002 .

[27]  Ole Gjølberg,et al.  Forecasting quarterly hog prices: Simple autoregressive models vs. naive predictions , 1997 .

[28]  Kurt Hornik,et al.  Stationary and Integrated Autoregressive Neural Network Processes , 2000, Neural Computation.

[29]  Lionel Tarassenko,et al.  A System for the Analysis of Jet Engine Vibration Data , 1999, Integr. Comput. Aided Eng..

[30]  Masaharu Kitamura,et al.  Anomaly detection by neural network models and statistical time series analysis , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[31]  Rajeev Rastogi,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD 2000.

[32]  John Guerard,et al.  Naïve, Arima, Nonparametric, Transfer Function, and VAR Models: A Comparison of Forecasting Performance , 2002 .

[33]  W. Krajewski,et al.  Real-time quality control of streamflow data―a simulation study , 1989 .

[34]  Peter J. Rousseeuw,et al.  Robust Regression and Outlier Detection , 2005, Wiley Series in Probability and Statistics.

[35]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[36]  Drummond StreetMontreal,et al.  A Model-independent Technique for Determining the Embedding Dimension , 2007 .

[37]  Ricard Gavaldà,et al.  Kalman Filters and Adaptive Windows for Learning in Data Streams , 2006, Discovery Science.

[38]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[39]  Hinrich Schütze,et al.  Projections for efficient document clustering , 1997, SIGIR '97.

[40]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .