Outliers Detection in One Dimensional Meteorological Data Stream

Sensors networks are some of technologies mostly used to gather informations from the environment. Indeed they collect a lot of data and send them to based stations for treatment. For meteorological monitoring, used sensors usually provide one dimensional data like temperatures, precipitations, humidity, etc. These data are speedily generated in the way that they form what we frequently call stream data. Among these data, there are usually bad values called outliers that need to be removed from the data stream. Many algorithms used to detect these outliers are usually designed for data with a static distribution and they do not consider the dynamic aspect of this distribution. However, one of the main characteristic of meteorological data is the dynamic behavior of the data distribution. Moreover, the speed of the data stream imposes to the outliers detection algorithms to be very fast in the data processing, otherwise a significant number of data could be lost. Regarding most of algorithms studied in the literature, it could be argued that these algorithms are not suitable for outliers detection in meteorological data stream. One of the reason is because of their time complexity and also their weak ability to easily detect contextual outliers. This work proposes a new outliers detection algorithm for one dimensional numeric data stream based on two filters that offer a complexity of O(n) and detect contextual outliers well, with a good precision.

[1]  Aleksandar Lazarevic,et al.  Incremental Local Outlier Detection for Data Streams , 2007, 2007 IEEE Symposium on Computational Intelligence and Data Mining.

[2]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[3]  Nirvana Meratnia,et al.  Outlier Detection Techniques for Wireless Sensor Networks: A Survey , 2008, IEEE Communications Surveys & Tutorials.

[4]  Sarika Khandelwal,et al.  Increased Performance Factor for the Best Clustering Algorithm , 2015 .

[5]  Matthew O. Ward,et al.  Neighbor-based pattern detection for windows over streaming data , 2009, EDBT '09.

[6]  Lin Feng,et al.  Research on Maximal Frequent Pattern Outlier Factor for Online High-Dimensional Time-Series Outlier Detection , 2010, J. Convergence Inf. Technol..

[7]  Chafiq Titouna,et al.  Adaptive Scheme for Outliers Detection in Wireless Sensor Networks , 2017 .

[8]  T. Christopher,et al.  A Comparative Analysis of Hierarchical and Partitioning Clustering Algorithms for Outlier Detection in Data Streams , 2015 .

[9]  Yannis Manolopoulos,et al.  Efficient and flexible algorithms for monitoring distance-based outliers over data streams , 2016, Inf. Syst..

[10]  Jay Vala,et al.  Survey on Outlier Detection in Data Stream , 2016 .

[11]  Fabrizio Angiulli,et al.  Distance-based outlier queries in data streams: the novel task and algorithms , 2010, Data Mining and Knowledge Discovery.

[12]  Bhavani M. Thuraisingham,et al.  Statistical technique for online anomaly detection using Spark over heterogeneous data from multi-source VMware performance data , 2014, 2014 IEEE International Conference on Big Data (Big Data).