NaNOD: A natural neighbour-based outlier detection algorithm

Outlier detection is an essential task in data mining applications which include, military surveillance, tax fraud detection, telecommunication, etc. In recent years, outlier detection received significant attention compared to other problem of discoveries. The focus on this has resulted in the growth of several outlier detection algorithms, mostly concerning the strategy based on distance or density. However, each strategy has intrinsic weaknesses. The distance-based techniques have the problem of local density, while the density-based method is recognized as having an issue of a low-density pattern. Also, most of the existing outlier detection algorithms have a parameter selection problem, which leads to poor detection results. In this article, we present an unsupervised density-based outlier detection algorithm to deal with these shortcomings. The proposed algorithm uses a Natural Neighbour (NaN) concept, to obtain a parameter called Natural Value (NV) adaptively, and a Weighted Kernel Density Estimation (WKDE) method to estimate the density at the location of an object. Besides, our proposed algorithm employed two different categories of nearest neighbours, k Nearest Neighbours ( k NN), and Reverse Nearest Neighbours (RNN), which make our system flexible in modelling different data patterns. A Gaussian kernel function is adopted to achieve smoothness in the measure. Further, we use an adaptive kernel width concept to enhance the discrimination power between normal and outlier samples. The formal analysis and extensive experiments carried out on both artificial and real datasets demonstrate that this technique can achieve better outlier detection performance.

[1]  Hans-Peter Kriegel,et al.  Interpreting and Unifying Outlier Scores , 2011, SDM.

[2]  J. Gladitz Barnett, V. & Lewis, T.: Outliers in Statistical Data, 2nd ed., John Wiley & Sons, Chi-chester – New York – Brisbane – Toronto – Singapore, 1984, XIV, 463 S., 26 Abb., £ 29.95, ISBN 0471905070 , 1988 .

[3]  Xiaoqin Zhang,et al.  RKOF: Robust Kernel-Based Local Outlier Detection , 2011, PAKDD.

[4]  Kit Yan Chan,et al.  Modeling manufacturing processes using a genetic programming-based fuzzy regression with detection of outliers , 2010, Inf. Sci..

[5]  Pasi Fränti,et al.  Outlier Detection Using k-Nearest Neighbour Graph , 2004, ICPR.

[6]  Jiang Xie,et al.  A local-gravitation-based method for the detection of outliers and boundary points , 2020, Knowl. Based Syst..

[7]  Jong-Seok Lee,et al.  A meta-learning approach for determining the number of clusters with consideration of nearest neighbors , 2013, Inf. Sci..

[8]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[9]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[10]  Gerhard P. Hancke,et al.  A Survey of Anomaly Detection in Industrial Wireless Sensor Networks with Critical Water System Infrastructure as a Case Study , 2018, Sensors.

[11]  Jeng-Shyang Pan,et al.  Adaptive quasiconformal kernel discriminant analysis , 2008, Neurocomputing.

[12]  Hans-Peter Kriegel,et al.  Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection , 2012, Data Mining and Knowledge Discovery.

[13]  Hans-Peter Kriegel,et al.  Generalized Outlier Detection with Flexible Kernel Density Estimates , 2014, SDM.

[14]  ShimKyuseok,et al.  Efficient algorithms for mining outliers from large data sets , 2000 .

[15]  Tomasz Andrysiak,et al.  Sparse representation and overcomplete dictionary learning for anomaly detection in electrocardiograms , 2018, Neural Computing and Applications.

[16]  Jian Tang,et al.  Enhancing Effectiveness of Outlier Detections for Low Density Patterns , 2002, PAKDD.

[17]  Jing Lin,et al.  Adaptive kernel density-based anomaly detection for nonlinear systems , 2018, Knowl. Based Syst..

[18]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[19]  Arthur Zimek,et al.  Outlier Detection Based on Low Density Models , 2018, 2018 IEEE International Conference on Data Mining Workshops (ICDMW).

[20]  Yong Hu,et al.  The application of data mining techniques in financial fraud detection: A classification framework and an academic review of literature , 2011, Decis. Support Syst..

[21]  Dorothy E. Denning,et al.  An Intrusion-Detection Model , 1987, IEEE Transactions on Software Engineering.

[22]  Melih Kirlidog,et al.  A Fraud Detection Approach with Data Mining in Health Insurance , 2012 .

[23]  Zhizhong Mao,et al.  Detecting outliers in industrial systems using a hybrid ensemble scheme , 2019, Neural Computing and Applications.

[24]  Ji Feng,et al.  Natural neighbor: A self-adaptive neighborhood method without parameter K , 2016, Pattern Recognit. Lett..

[25]  Haibo He,et al.  A local density-based approach for outlier detection , 2017, Neurocomputing.

[26]  W. R. Buckland,et al.  Outliers in Statistical Data , 1979 .

[27]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[28]  Ji Feng,et al.  A non-parameter outlier detection algorithm based on Natural Neighbor , 2016, Knowl. Based Syst..

[29]  Aleksandar Lazarevic,et al.  Outlier Detection with Kernel Density Functions , 2007, MLDM.

[30]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[31]  Anthony K. H. Tung,et al.  Ranking Outliers Using Symmetric Neighborhood Relationship , 2006, PAKDD.

[32]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[33]  Jeng-Shyang Pan,et al.  Kernel optimization-based discriminant analysis for face recognition , 2009, Neural Computing and Applications.

[34]  Jong-Seok Lee,et al.  Robust outlier detection using the instability factor , 2014, Knowl. Based Syst..