Outliers detection in environmental monitoring databases

Environmental monitoring is nowadays an important task in many industrial operations. In order to comply with strong environmental laws, they have implemented monitoring systems based on a network of air quality and meteorological stations providing real-time measurements of key variables associated to the distribution of pollutants in surrounding areas. These measurements can be contaminated by outliers, which must be discarded in order to have a consistent set of data. This work presents a nonlinear procedure for outliers detection based on residual analysis of regression with Partial Least Squares and Artificial Neural Networks. In order to minimize the negative effect of outliers in the training dataset a learning algorithm with regularization is proposed. This algorithm is based on a Quasi-Newton optimization method and it was tested on a simulated nonlinear process, on real data from environmental monitoring contaminated with synthetic outliers, and finally applied to a real environmental monitoring data obtained from a monitoring station and having natural outliers. The results are encouraging and further developments are foreseen for including information from neighboring stations and emission source operation.

[1]  I. Jolliffe Principal Component Analysis , 2002 .

[2]  Patrick Wiegand,et al.  Simultaneous variable selection and outlier detection using a robust genetic algorithm , 2009 .

[3]  Kadir Liano,et al.  Robust error measure for supervised neural network learning with outliers , 1996, IEEE Trans. Neural Networks.

[4]  Lian-kui Dai,et al.  Partial least squares with outlier detection in spectral analysis: A tool to predict gasoline properties , 2009 .

[5]  Luigi Fortuna,et al.  Soft Sensors for Monitoring and Control of Industrial Processes (Advances in Industrial Control) , 2006 .

[6]  Dezhao Chen,et al.  Detection of outlier and a robust BP algorithm against outlier , 2004, Comput. Chem. Eng..

[7]  Peter Filzmoser,et al.  Outlier identification in high dimensions , 2008, Comput. Stat. Data Anal..

[8]  E. Martin,et al.  Non-linear projection to latent structures revisited: the quadratic PLS algorithm , 1999 .

[9]  Hengkun Xie,et al.  Combined use of partial least-squares regression and neural network for residual life estimation of large generator stator insulation , 2007 .

[10]  Leo H. Chiang,et al.  Exploring process data with the use of robust outlier detection algorithms , 2003 .

[11]  ChangKyoo Yoo,et al.  Erratum to “Nonlinear PLS modeling with fuzzy inference system” [Chemometr. Intell. Lab. Syst. 64 (2003) 137–155] , 2003 .

[12]  A. J. Morris,et al.  Non-linear projection to latent structures revisited (the neural network PLS algorithm) , 1999 .

[13]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[14]  C. Yoo,et al.  Nonlinear PLS modeling with fuzzy inference system , 2002 .

[15]  Bogdan Gabrys,et al.  Data-driven Soft Sensors in the process industry , 2009, Comput. Chem. Eng..

[16]  Ronald K. Pearson,et al.  Outliers in process modeling and identification , 2002, IEEE Trans. Control. Syst. Technol..

[17]  Ramesh C. Jain,et al.  A robust backpropagation learning algorithm for function approximation , 1994, IEEE Trans. Neural Networks.

[18]  Sten Bay Jørgensen,et al.  A systematic approach for soft sensor development , 2007, Comput. Chem. Eng..

[19]  Shun-Feng Su,et al.  The annealing robust backpropagation (ARBP) learning algorithm , 2000, IEEE Trans. Neural Networks Learn. Syst..

[20]  Girijesh Prasad,et al.  Statistical and computational intelligence techniques for inferential model development: a comparative evaluation and a novel proposition for fusion , 2004, Eng. Appl. Artif. Intell..