Outlier detection and classification in sensor data streams for proactive decision support systems

A paper has a deal with the problem of quality assessment in sensor data streams accumulated by proactive decision support systems. The new problem is stated where outliers need to be detected and to be classified according to their nature of origin. There are two types of outliers defined; the first type is about misoperations of a system and the second type is caused by changes in the observed system behavior due to inner and external influences. The proposed method is based on the data-driven forecast approach to predict the values in the incoming data stream at the expected time. This method includes the forecasting model and the clustering model. The forecasting model predicts a value in the incoming data stream at the expected time to find the deviation between a real observed value and a predicted one. The clustering method is used for taxonomic classification of outliers. Constructive neural networks models (CoNNS) and evolving connectionists systems (ECS) are used for prediction of sensors data. There are two real world tasks are used as case studies. The maximal values of accuracy are 0.992 and 0.974, and F1 scores are 0.967 and 0.938, respectively, for the first and the second tasks. The conclusion contains findings how to apply the proposed method in proactive decision support systems.

[1]  David J. Hill,et al.  Anomaly detection in streaming environmental sensor data: A data-driven modeling approach , 2010, Environ. Model. Softw..

[2]  Mohamed Medhat Gaber,et al.  Advances in data stream mining , 2012, WIREs Data Mining Knowl. Discov..

[3]  W. R. Buckland,et al.  Outliers in Statistical Data , 1979 .

[4]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[5]  Maxim Shcherbakov,et al.  Automatic two way synchronization between server and multiple clients for HVAC system , 2011, iiWAS '11.

[6]  Peide Liu Research on Risk Evaluation for Venture Capital Based on Intuitionistic Fuzzy Set and TOPSIS , 2007, The First International Symposium on Data, Privacy, and E-Commerce (ISDPE 2007).

[7]  John Elder,et al.  Handbook of Statistical Analysis and Data Mining Applications , 2009 .

[8]  Charles E. Heckler,et al.  Applied Multivariate Statistical Analysis , 2005, Technometrics.

[9]  M. Braga,et al.  Exploratory Data Analysis , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[10]  Nikola Kasabov,et al.  Evolving connectionist systems , 2002 .

[11]  Jiawei Han,et al.  Research Challenges for Data Mining in Science and Engineering , 2008, Next Generation of Data Mining.

[12]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[13]  Mohamed Medhat Gaber,et al.  Learning from Data Streams: Processing Techniques in Sensor Networks , 2007 .

[14]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[15]  F. E. Grubbs,et al.  Extension of Sample Sizes and Percentage Points for Significance Tests of Outlying Observations , 1972 .

[16]  Magnus Löfstrand,et al.  Increasing availability of industrial systems through data stream mining , 2011, Comput. Ind. Eng..

[17]  A. Madansky Identification of Outliers , 1988 .

[18]  Michael Stonebraker,et al.  Aurora: a data stream management system , 2003, SIGMOD '03.

[19]  R. D. Veaux,et al.  Prediction intervals for neural networks via nonlinear regression , 1998 .

[20]  João Gama,et al.  Learning from Data Streams , 2009, Encyclopedia of Data Warehousing and Mining.

[21]  Christos Faloutsos,et al.  LOCI: fast outlier detection using the local correlation integral , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[22]  Jun Gao,et al.  Identifying Multi-instance Outliers , 2010, SDM.

[23]  Anthony K. H. Tung,et al.  Ranking Outliers Using Symmetric Neighborhood Relationship , 2006, PAKDD.

[24]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[25]  Philip S. Yu,et al.  Online Failure Forecast for Fault-Tolerant Data Stream Processing , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[26]  Nataliya Shcherbakova,et al.  Using connectionist systems for electric energy consumption forecasting in shopping centers , 2012 .

[27]  P. Rousseeuw,et al.  Computing depth contours of bivariate point clouds , 1996 .

[28]  Houkuan Huang,et al.  A Grid-Based Clustering Algorithm for Network Anomaly Detection , 2007, The First International Symposium on Data, Privacy, and E-Commerce (ISDPE 2007).

[29]  Carlos Soares,et al.  Outlier Detection using Clustering Methods: a data cleaning application , 2004 .

[30]  Nitesh V. Chawla,et al.  Knowledge discovery from sensor data (SensorKDD) , 2008, SKDD.

[31]  Li Tian,et al.  Research on Prediction Models over Distributed Data Streams , 2006, WISE Workshops.