Improved real-time data anomaly detection using context classification

The number of automated measuring and reporting systems used in water distribution and sewer systems is dramatically increasing and, as a consequence, so is the volume of data acquired. Since real-time data is likely to contain a certain amount of anomalous values and data acquisition equipment is not perfect, it is essential to equip the SCADA (Supervisory Control and Data Acquisition) system with automatic procedures that can detect the related problems and assist the user in monitoring and managing the incoming data. A number of different anomaly detection techniques and methods exist and can be used with varying success. To improve the performance, these methods must be fine tuned according to crucial aspects of the process monitored and the contexts in which the data are classified. The aim of this paper is to explore if the data context classification and pre-processing techniques can be used to improve the anomaly detection methods, especially in fully automated systems. The methodology developed is tested on sets of real-life data, using different standard and experimental anomaly detection procedures including statistical, model-based and data-mining approaches. The results obtained clearly demonstrate the effectiveness of the suggested anomaly detection methodology.

[1]  Jung-Min Park,et al.  An overview of anomaly detection techniques: Existing solutions and latest technological trends , 2007, Comput. Networks.

[2]  U Jeppsson,et al.  Multivariate on-line monitoring: challenges and solutions for modern wastewater treatment operation. , 2003, Water science and technology : a journal of the International Association on Water Pollution Research.

[3]  M Mourad,et al.  A method for automatic validation of long time series of data in urban hydrology. , 2002, Water science and technology : a journal of the International Association on Water Pollution Research.

[4]  N Branisavljević,et al.  Automatic, semi-automatic and manual validation of urban drainage data. , 2010, Water science and technology : a journal of the International Association on Water Pollution Research.

[5]  Raghunathan Rengaswamy,et al.  A review of process fault detection and diagnosis: Part I: Quantitative model-based methods , 2003, Comput. Chem. Eng..

[6]  이인범 Sensor validation and reconciliation for a partial nitrification process , 2005 .

[7]  Robert A. Lordo,et al.  Learning from Data: Concepts, Theory, and Methods , 2001, Technometrics.

[8]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[9]  Renzo Rosso,et al.  Statistics, Probability and Reliability for Civil and Environmental Engineers , 1997 .

[10]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[11]  Raghunathan Rengaswamy,et al.  A review of process fault detection and diagnosis: Part III: Process history based methods , 2003, Comput. Chem. Eng..

[12]  Raghunathan Rengaswamy,et al.  A review of process fault detection and diagnosis: Part II: Qualitative models and search strategies , 2003, Comput. Chem. Eng..