A Participation Degree-Based Fault Detection Method for Wireless Sensor Networks

In wireless sensor networks (WSNs), there are many challenges for outlier detection, such as fault detection, fraud detection, intrusion detection, and so on. In this paper, the participation degree of instances in the hierarchical clustering process infers the relationship between instances. However, most of the existing algorithms ignore such information. Thus, we propose a novel fault detection technique based on the participation degree, called fault detection based on participation degree (FDP). Our algorithm has the following advantages. First, it does not need data training in labeled datasets; in fact, it uses the participation degree to measure the differences between fault points and normal points without setting distance or density parameters. Second, FDP can detect global outliers without local cluster influence. Experimental results demonstrate the performance of our approach by applying it to synthetic and real-world datasets and contrasting it with four well-known techniques: isolation forest (IF), local outlier factor (LOF), one-class support vector machine (OCS), and robust covariance (RC).

[1]  Hanan Samet,et al.  Storing a collection of polygons using quadtrees , 1985, TOGS.

[2]  Vicenç Puig,et al.  Diagnosis of Fluid Leaks in Pipelines Using Dynamic PCA , 2018 .

[3]  Eric Smith,et al.  Outlier Detection for Sensor Systems (ODSS): A MATLAB Macro for Evaluating Microphone Sensor Data Quality , 2017, Sensors.

[4]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[5]  Christos Faloutsos,et al.  LOCI: fast outlier detection using the local correlation integral , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[6]  Mia Hubert,et al.  An adjusted boxplot for skewed distributions , 2008, Comput. Stat. Data Anal..

[7]  Nurkhairany Amyra Mokhtar,et al.  A clustering approach to detect multiple outliers in linear functional relationship model for circular data , 2018 .

[8]  Jon Louis Bentley,et al.  Quad trees a data structure for retrieval on composite keys , 1974, Acta Informatica.

[9]  I. Wald,et al.  On building fast kd-Trees for Ray Tracing, and on doing that in O(N log N) , 2006, 2006 IEEE Symposium on Interactive Ray Tracing.

[10]  H. Edelsbrunner,et al.  Efficient algorithms for agglomerative hierarchical clustering methods , 1984 .

[11]  Boris Mirkin,et al.  Mathematical Classification and Clustering: From How to What and Why , 1998 .

[12]  Chris H. Q. Ding,et al.  Cluster Aggregate Inequality and Multi-level Hierarchical Clustering , 2005, PKDD.

[13]  Chang Wei,et al.  Outlier Detection in Wireless Sensor Networks Using Model Selection-Based Support Vector Data Descriptions , 2018, Sensors.

[14]  G. N. Lance,et al.  A general theory of classificatory sorting strategies: II. Clustering systems , 1967, Comput. J..

[15]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[16]  Helena Rifà-Pous,et al.  Difficulties and Challenges of Anomaly Detection in Smart Cities: A Laboratory Analysis , 2018, Sensors.

[17]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[18]  Jean-Philippe Thiran,et al.  Cluster validity measure and merging system for hierarchical clustering considering outliers , 2015, Pattern Recognit..

[19]  Retantyo Wardoyo,et al.  Time Complexity Analysis of Support Vector Machines (SVM) in LibSVM , 2015 .

[20]  Wei Zhang,et al.  NNB: An efficient nearest neighbor search method for hierarchical clustering on large datasets , 2015, Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015).

[21]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[22]  Jin Sun,et al.  Resource Management for Improving Soft-Error and Lifetime Reliability of Real-Time MPSoCs , 2019, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[23]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[24]  D. Powers Evaluation: From Precision, Recall and F-Factor to ROC, Informedness, Markedness & Correlation , 2008 .

[25]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[26]  Raymond T. Ng,et al.  Distance-based outliers: algorithms and applications , 2000, The VLDB Journal.

[27]  Sheng-yi Jiang,et al.  Clustering-Based Outlier Detection Method , 2008, 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery.

[28]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[29]  R. Krishnamoorthy,et al.  An Improved Agglomerative Clustering Algorithm for Outlier Detection , 2016 .

[30]  Fei Tony Liu,et al.  Isolation-Based Anomaly Detection , 2012, TKDD.

[31]  Tri-Dung Nguyen,et al.  Outlier detection and robust covariance estimation using mathematical programming , 2010, Adv. Data Anal. Classif..

[32]  Sergio Greco,et al.  An information-theoretic approach to hierarchical clustering of uncertain data , 2017, Inf. Sci..

[33]  Stefan Berchtold,et al.  Efficient Biased Sampling for Approximate Clustering and Outlier Detection in Large Data Sets , 2003, IEEE Trans. Knowl. Data Eng..

[34]  F. Frances Yao,et al.  Computational Geometry , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[35]  Rajen B. Bhatt,et al.  User Localization in an Indoor Environment Using Fuzzy Hybrid of Particle Swarm Optimization & Gravitational Search Algorithm with Neural Networks , 2016, SocProS.

[36]  Haibo He,et al.  A local density-based approach for outlier detection , 2017, Neurocomputing.

[37]  Sebastião J. Formosinho,et al.  Improving hierarchical cluster analysis: A new method with outlier detection and automatic clustering , 2007 .

[38]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[39]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[40]  A. Madansky Identification of Outliers , 1988 .

[41]  Junliang Chen,et al.  ODDC: Outlier Detection Using Distance Distribution Clustering , 2007, PAKDD Workshops.

[42]  C. Cannings,et al.  Developments in Statistics, Vol. 3. , 1981 .