Observing the unobservable : distributed online outlier detection in wireless sensor networks

Raw sensor observations often have low data quality and reliability due to both internal and external factors including low quality of cheap sensors, dynamicity of network conditions, and harshness of the deployment environment. Use of low quality sensor data in any data analysis and decision making process will not only negatively impact analysis results and decisions made but also waste huge amount of valuable and limited network resources such as energy, as many incorrect values are transmitted. Low quality sensor data also prevents WSNs to fulfill their promises in terms of reliable real-time situation-awareness, as the low quality sensor data may generate large number of false alarms. Motivated by the need to improve quality of data analysis and decision making, enhance efficiency of using WSNs resources by preventing unnecessary transmission of erroneous sensor observations, and increase effectiveness of monitoring and situation-awareness capabilities of the WSNs, in this thesis we focus on online identification of outliers whenever and wherever they occur. Outliers in WSNs are those observations that represent erroneous values (errors) or indicate particular phenomenal changes (events). Our outlier detection techniques, which are based on distributed in-network data processing, identify sensor observations that do not conform to normal behavior of sensor data without using a pre-defined threshold or triggering conditions. Our main research objective is to design and implement effective and efficient outlier detection techniques for WSNs to identify outliers in an online and distributed manner and distinguish between errors and events with high accuracy and low false alarm, while maintaining the communication, computation and memory complexity low. Main contributions of this thesis can be summarized as: 1. Providing a technique-based taxonomy and a guideline for outlier detection techniques for WSNs. 2. Design and comparison of data labelling techniques for performance evaluation of outlier detection techniques. 3. Proposing statistical-Based outlier detection techniques for WSNs. 4. Proposing spherical support vector machine (SVM)-based outlier detection techniques for WSNs.. 5. Proposing ellipsoidal support vector machine (SVM)-based outlier detection techniques for WSNs.

[1]  Sameer Singh,et al.  Novelty detection: a review - part 1: statistical approaches , 2003, Signal Process..

[2]  Maurizio Tomasella,et al.  Vision and Challenges for Realising the Internet of Things , 2010 .

[3]  Archana Bharathidasan,et al.  Sensor Networks : An Overview , 2002 .

[4]  Marimuthu Palaniswami,et al.  CESVM: Centered Hyperellipsoidal Support Vector Machine Based Anomaly Detection , 2008, 2008 IEEE International Conference on Communications.

[5]  F. E. Grubbs Procedures for Detecting Outlying Observations in Samples , 1969 .

[6]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[7]  Aric A. Hagberg,et al.  Separating the Wheat from the Chaff: Practical Anomaly Detection Schemes in Ecological Applications of Distributed Sensor Networks , 2007, DCOSS.

[8]  Gustavo Alonso,et al.  Declarative Support for Sensor Data Cleaning , 2006, Pervasive.

[9]  Xiao-Hua Yu,et al.  Rotorcraft Acoustic Noise Estimation and Outlier Detection , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[10]  W. R. Buckland,et al.  Outliers in Statistical Data , 1979 .

[11]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[12]  S. Sitharama Iyengar,et al.  Distributed Bayesian algorithms for fault-tolerant event region detection in wireless sensor networks , 2004, IEEE Transactions on Computers.

[13]  Kuei-Ping Shih,et al.  CollECT: Collaborative Event deteCtion and Tracking in Wireless Heterogeneous Sensor Networks , 2006, 11th IEEE Symposium on Computers and Communications (ISCC'06).

[14]  Sam Yuan Sung,et al.  Detecting pattern-based outliers , 2003, Pattern Recognit. Lett..

[15]  L. M. Berliner,et al.  Hierarchical Bayesian space-time models , 1998, Environmental and Ecological Statistics.

[16]  P. Rousseeuw,et al.  Computing depth contours of bivariate point clouds , 1996 .

[17]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[18]  Eleazar Eskin,et al.  Anomaly Detection over Noisy Data using Learned Probability Distributions , 2000, ICML.

[19]  Lionel Sacks,et al.  Active Platform Security through Intrusion Detection Using Naïve Bayesian Network for Anomaly Detection , 2002 .

[20]  A. Madansky Identification of Outliers , 1988 .

[21]  Özgür B. Akan,et al.  Spatio-temporal correlation: theory and applications for wireless sensor networks , 2004, Comput. Networks.

[22]  Qiong Luo,et al.  Online Mining in Sensor Networks , 2004, NPC.

[23]  Ming Dong,et al.  On distributed fault-tolerant detection in wireless sensor networks , 2006, IEEE Transactions on Computers.

[24]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[25]  C.-C. Jay Kuo,et al.  Distributed spatio-temporal outlier detection in sensor networks , 2005, SPIE Defense + Commercial Sensing.

[26]  Shian-Shyong Tseng,et al.  Two-phase clustering process for outliers detection , 2001, Pattern Recognit. Lett..

[27]  Osmar R. Zaïane,et al.  A Nonparametric Outlier Detection for Effectively Discovering Top-N Outliers from Engineering Data , 2006, PAKDD.

[28]  Symeon Papavassiliou,et al.  Hierarchical Anomaly Detection in Distributed Large-Scale Sensor Networks , 2006, 11th IEEE Symposium on Computers and Communications (ISCC'06).

[29]  Bo Sheng,et al.  Outlier detection in sensor networks , 2007, MobiHoc '07.

[30]  Theodore Johnson,et al.  Fast Computation of 2-Dimensional Depth Contours , 1998, KDD.

[31]  Jaideep Srivastava,et al.  A Comparative Study of Anomaly Detection Schemes in Network Intrusion Detection , 2003, SDM.

[32]  Gene H. Golub,et al.  Matrix computations , 1983 .

[33]  Peter Filzmoser,et al.  Introduction to Multivariate Statistical Analysis in Chemometrics , 2009 .

[34]  D. Janakiram,et al.  Outlier Detection in Wireless Sensor Networks using Bayesian Belief Networks , 2006, 2006 1st International Conference on Communication Systems Software & Middleware.

[35]  Kevin Ni,et al.  Sensor Network Data Fault Detection using Hierarchical Bayesian Space-Time Modeling , 2009 .

[36]  Bonnie S. Heck-Ferri,et al.  Distributed Fault-Tolerance for Event Detection Using Heterogeneous Wireless Sensor Networks , 2012, IEEE Transactions on Mobile Computing.

[37]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[38]  Michael Friendly,et al.  Visualizing categorical data in ViSta , 2003, Comput. Stat. Data Anal..

[39]  Osmar R. Zaïane,et al.  A parameterless method for efficiently discovering clusters of arbitrary shape in large datasets , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[40]  Panos K. Chrysanthis,et al.  Mobile Sensor Network Data Management , 2009, Encyclopedia of Database Systems.

[41]  M. Braga,et al.  Exploratory Data Analysis , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[42]  Sameer Singh,et al.  Novelty detection: a review - part 2: : neural network based approaches , 2003, Signal Process..

[43]  Katia Obraczka,et al.  Isolines: efficient spatio-temporal data aggregation in sensor networks , 2009, Wirel. Commun. Mob. Comput..

[44]  Ian F. Akyildiz,et al.  Sensor Networks , 2002, Encyclopedia of GIS.

[45]  Huirong Fu,et al.  Intrusion Detection System for Wireless Sensor Networks , 2008, Security and Management.

[46]  Dimitrios Gunopulos,et al.  Online outlier detection in sensor data using non-parametric models , 2006, VLDB.

[47]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[48]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[49]  Aidong Zhang,et al.  FindOut: Finding Outliers in Very Large Datasets , 2002, Knowledge and Information Systems.

[50]  Deborah Estrin,et al.  Habitat monitoring with sensor networks , 2004, CACM.

[51]  Nirvana Meratnia,et al.  Use of event detection approaches for outlier detection in wireless sensor networks , 2009, 2009 International Conference on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP).

[52]  A. Stein,et al.  Mapping Wind‐Blown Mass Transport by Modeling Variability in Space and Time , 1997 .

[53]  Kanti V. Mardia,et al.  A Comparison of Spatio-Temporal Bayesian Models for Reconstruction of Rainfall Fields in a Cloud Seeding Experiment , 2005 .

[54]  F. Y. Edgeworth,et al.  XLI. On discordant observations , 1887 .

[55]  Mikhail Petrovskiy,et al.  Outlier Detection Algorithms in Data Mining Systems , 2003, Programming and Computer Software.

[56]  P. A. Blight The Analysis of Time Series: An Introduction , 1991 .

[57]  Yvan Pannatier,et al.  Variowin: Software for Spatial Data Analysis in 2D , 1996 .

[58]  Ada Wai-Chee Fu,et al.  Enhancements on local outlier detection , 2003, Seventh International Database Engineering and Applications Symposium, 2003. Proceedings..

[59]  Carlos F. García-Hernández,et al.  Wireless Sensor Networks and Applications: a Survey , 2007 .

[60]  Wei Hong,et al.  The design of an acquisitional query processor for sensor networks , 2003, SIGMOD '03.

[61]  R. Reese Geostatistics for Environmental Scientists , 2001 .

[62]  Parameswaran Ramanathan,et al.  Fault tolerance in collaborative sensor networks for target detection , 2004, IEEE Transactions on Computers.

[63]  Martin Meckesheimer,et al.  Automatic outlier detection for time series: an application to sensor data , 2007, Knowledge and Information Systems.

[64]  Klaus-Robert Müller,et al.  Intrusion detection in unlabeled data with quarter-sphere Support Vector Machines , 2004 .

[65]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[66]  Yu Meng,et al.  Mining Developing Trends of Dynamic Spatiotemporal Data Streams , 2006, J. Comput..

[67]  Nirvana Meratnia,et al.  Outlier Detection Techniques for Wireless Sensor Networks: A Survey , 2008, IEEE Communications Surveys & Tutorials.

[68]  S. Manesis,et al.  A Survey of Applications of Wireless Sensors and Wireless Sensor Networks , 2005, Proceedings of the 2005 IEEE International Symposium on, Mediterrean Conference on Control and Automation Intelligent Control, 2005..

[69]  Antonio Alfredo Ferreira Loureiro,et al.  Decentralized intrusion detection in wireless sensor networks , 2005, Q2SWinet '05.

[70]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[71]  Richard A. Davis,et al.  Introduction to time series and forecasting , 1998 .

[72]  Raheem A. Beyah,et al.  Composite Event Detection in Wireless Sensor Networks , 2007, 2007 IEEE International Performance, Computing, and Communications Conference.

[73]  Zengyou He,et al.  Discovering cluster-based local outliers , 2003, Pattern Recognit. Lett..

[74]  Nael B. Abu-Ghazaleh,et al.  A taxonomy of wireless micro-sensor network models , 2002, MOCO.

[75]  Yozo Hida,et al.  Aggregation Query Under Uncertainty in Sensor Networks CS 252 Project , 2003 .

[76]  Jeffrey Scott Vitter,et al.  Mining deviants in time series data streams , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[77]  David G. Luenberger,et al.  Linear and nonlinear programming , 1984 .

[78]  Loren Schwiebert,et al.  Distributed Event Detection in Sensor Networks , 2006, 2006 International Conference on Systems and Networks Communications (ICSNC'06).

[79]  Nirvana Meratnia,et al.  Why General Outlier Detection Techniques Do Not Suffice For Wireless Sensor Networks , 2009 .

[80]  Xiuzhen Cheng,et al.  Localized fault-tolerant event boundary detection in sensor networks , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..

[81]  Martti Juhola,et al.  Informal identification of outliers in medical data , 2000 .

[82]  Penelope Vounatsou,et al.  Bayesian Spatio-Temporal Modeling of Schistosoma japonicum Prevalence Data in the Absence of a Diagnostic ‘Gold’ Standard , 2008, PLoS neglected tropical diseases.

[83]  James H. Aylor,et al.  Computer for the 21st Century , 1999, Computer.

[84]  Eyal Amir,et al.  Real-time Bayesian Anomaly Detection for Environmental Sensor Data , 2007 .

[85]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[86]  Paul J. M. Havinga,et al.  Experiences with Implementing a Distributed and Self-Organizing Scheduling Algorithm for Energy-Efficient Data Gathering on a Real-Life Sensor Network Platform , 2007, 2007 IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks.

[87]  Ajay Gupta,et al.  Anomaly intrusion detection in wireless sensor networks , 2006, J. High Speed Networks.

[88]  Colin Campbell,et al.  A Linear Programming Approach to Novelty Detection , 2000, NIPS.

[89]  Nirvana Meratnia,et al.  A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets , 2007 .

[90]  Arthur Gretton,et al.  On-line one-class support vector machines. An application to signal segmentation , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[91]  H. Arp Discordant observations. , 1990, Science.

[92]  Arthur Gretton,et al.  An online support vector machine for abnormal events detection , 2006, Signal Process..

[93]  Mohamed Medhat Gaber,et al.  Knowledge Discovery from Sensor Data , 2008 .

[94]  Mohamed Medhat Gaber,et al.  Data Stream Processing in Sensor Networks , 2007 .

[95]  Aleksandar Lazarevic,et al.  Incremental Local Outlier Detection for Data Streams , 2007, 2007 IEEE Symposium on Computational Intelligence and Data Mining.

[96]  Martin Mueller,et al.  Self-aware services: using Bayesian networks for detecting anomalies in Internet-based services , 2001, 2001 IEEE/IFIP International Symposium on Integrated Network Management Proceedings. Integrated Network Management VII. Integrated Management Strategies for the New Millennium (Cat. No.01EX470).

[97]  Dimitrios Gunopulos,et al.  Distributed deviation detection in sensor networks , 2003, SGMD.

[98]  William Perrizo,et al.  A vertical outlier detection algorithm with clusters as by-product , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[99]  M. Palaniswami,et al.  Distributed Anomaly Detection in Wireless Sensor Networks , 2006, 2006 10th IEEE Singapore International Conference on Communication Systems.

[100]  George Roussos,et al.  Escalation: Complex Event Detection in Wireless Sensor Networks , 2007, EuroSSC.

[101]  Christos Faloutsos,et al.  LOCI: fast outlier detection using the local correlation integral , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[102]  Zhang Yang,et al.  An online outlier detection technique for wireless sensor networks using unsupervised quarter-sphere support vector machine , 2008, 2008 International Conference on Intelligent Sensors, Sensor Networks and Information Processing.

[103]  Paul J. M. Havinga,et al.  Energy-Efficient Data Acquisition Using a Distributed and Self-organizing Scheduling Algorithm for Wireless Sensor Networks , 2007, DCOSS.

[104]  Johannes Gehrke,et al.  Query Processing in Sensor Networks , 2003, CIDR.

[105]  Ran Wolff,et al.  Noname manuscript No. (will be inserted by the editor) In-Network Outlier Detection in Wireless Sensor Networks , 2022 .

[106]  D. Janaki Ram,et al.  Distributed collaboration for event detection in wireless sensor networks , 2005, MPAC '05.

[107]  Robert P. W. Duin,et al.  Support vector domain description , 1999, Pattern Recognit. Lett..

[108]  B. R. Badrinath,et al.  Context-Aware Sensors , 2004, EWSN.

[109]  Marimuthu Palaniswami,et al.  Intrusion Detection for Routing Attacks in Sensor Networks , 2006, Int. J. Distributed Sens. Networks.

[110]  Mihai Marin-Perianu,et al.  Collaborative Wireless Sensor Networks in Industrial and Business Processes , 2008 .

[111]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[112]  Marimuthu Palaniswami,et al.  Quarter Sphere Based Distributed Anomaly Detection in Wireless Sensor Networks , 2007, 2007 IEEE International Conference on Communications.

[113]  Jianzhong Li,et al.  Unsupervised Outlier Detection in Sensor Networks Using Aggregation Tree , 2007, ADMA.

[114]  Silvia Nittel Geosensor Networks , 2008, Encyclopedia of GIS.

[115]  Arun Somani,et al.  Distributed fault detection of wireless sensor networks , 2006, DIWANS '06.

[116]  Lei Chen,et al.  In-network Outlier Cleaning for Data Collection in Sensor Networks , 2006, CleanDB.

[117]  Chandan Srivastava,et al.  Support Vector Data Description , 2011 .

[118]  Hongxing He,et al.  Outlier Detection Using Replicator Neural Networks , 2002, DaWaK.

[119]  Anthony K. H. Tung,et al.  Mining top-n local outliers in large databases , 2001, KDD '01.

[120]  Stefan Berchtold,et al.  Efficient Biased Sampling for Approximate Clustering and Outlier Detection in Large Data Sets , 2003, IEEE Trans. Knowl. Data Eng..

[121]  Defeng Wang,et al.  Structured One-Class Classification , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[122]  Nirvana Meratnia,et al.  Ensuring high sensor data quality through use of online outlier detection techniques , 2010, Int. J. Sens. Networks.

[123]  Stephen D. Bay,et al.  Mining distance-based outliers in near linear time with randomization and a simple pruning rule , 2003, KDD '03.

[124]  Nirvana Meratnia,et al.  Adaptive and Online One-Class Support Vector Machine-Based Outlier Detection Techniques for Wireless Sensor Networks , 2009, 2009 International Conference on Advanced Information Networking and Applications Workshops.

[125]  Peter Sykacek,et al.  Equivalent error bars for neural network classifiers trained by Bayesian inference , 1997, ESANN.

[126]  Gregory J. Pottie,et al.  Wireless integrated network sensors , 2000, Commun. ACM.

[127]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[128]  Robert Haining,et al.  Statistics for spatial data: by Noel Cressie, 1991, John Wiley & Sons, New York, 900 p., ISBN 0-471-84336-9, US $89.95 , 1993 .

[129]  Paul J.M. Havinga,et al.  Implementation of an On-Demand Routing Protocol for Wireless Sensor Networks , 2006 .

[130]  Roger Woodard,et al.  Interpolation of Spatial Data: Some Theory for Kriging , 1999, Technometrics.

[131]  Artemis Moroni,et al.  Vision and Challenges for Realising the Internet of Things , 2010 .

[132]  Weili Wu,et al.  Localized Outlying and Boundary Data Detection in Sensor Networks , 2007, IEEE Transactions on Knowledge and Data Engineering.

[133]  Mukesh Singhal,et al.  Security in wireless sensor networks , 2008, Wirel. Commun. Mob. Comput..

[134]  Sungzoon Cho,et al.  Prototype based outlier detection , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[135]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[136]  Ian F. Akyildiz,et al.  Wireless sensor networks: a survey , 2002, Comput. Networks.