An Efficient Unsupervised Clustered Adaptive Antihub Technique for Outlier Detection in High Dimensional Data

Objective: The objective of this paper is to find the inconsistent objects in data which has high dimension through reduced computation time and increased accuracy. Methods: Hubness specifically Antihubs (points that rarely occur in k nearest neighbor lists) is the newly recognized concept for handling data which has high dimension. The advanced version of Antihub is Antihub2 which is for reconsidering the outlier score of a point obtained by the Antihub method. However, regarding computation time, Antihub2 runs slower. This paper institutes an approach called AdaptiveAntihub2Clust, which is a clustered Adaptive Antihub technique for unsupervised outlier detection to reduce computation time and to improve the accuracy. Findings: The results of an existing Antihub2 method is compared with the proposed method called AdaptiveAntihub2Clust. The experimental results elucidate that AdaptiveAntihub2Clust outperforms well than Antihub2 and also resolved that there is not only a substantial decrease in computation time but also progress in accuracy occurred while the newly built approach is practically used for finding outliers. Applications: The irrelevant objects may ascend due to numerous faults. Detection of such objects identifies the mistakes and fraud before they deteriorate with terrible significances and cleanses the data for further processing.

[1]  Amutha Prabakar Muniyandi,et al.  Network Anomaly Detection by Cascading K-Means Clustering and C4.5 Decision Tree algorithm , 2012 .

[2]  Dunja Mladenic,et al.  The influence of hubness on nearest-neighbor methods in object recognition , 2011, 2011 IEEE 7th International Conference on Intelligent Computer Communication and Processing.

[3]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[4]  Hans-Peter Kriegel,et al.  LoOP: local outlier probabilities , 2009, CIKM.

[5]  Philip S. Yu,et al.  Outlier detection for high dimensional data , 2001, SIGMOD '01.

[6]  Alexandros Nanopoulos,et al.  Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data , 2010, J. Mach. Learn. Res..

[7]  Carlos Soares,et al.  Outlier Detection using Clustering Methods: a data cleaning application , 2004 .

[8]  Georg Carle,et al.  Traffic Anomaly Detection Using K-Means Clustering , 2007 .

[9]  Mennatallah Amer,et al.  Comparison of Unsupervised Anomaly Detection Techniques Bachelor Thesis , 2011 .

[10]  Sukumar Nandi,et al.  An Outlier Detection Method Based on Clustering , 2011, 2011 Second International Conference on Emerging Applications of Information Technology.

[11]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[12]  M. Amer,et al.  Nearest-Neighbor and Clustering based Anomaly Detection Algorithms for RapidMiner , 2012 .

[13]  R. Devi,et al.  Hubness in Unsupervised Outlier Detection Techniques for High Dimensional Data –A Survey , 2015 .

[14]  Alexandros Nanopoulos,et al.  Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection , 2015, IEEE Transactions on Knowledge and Data Engineering.

[15]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[16]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[17]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[18]  S. K. Sharma,et al.  An improved network intrusion detection technique based on k-means clustering via Naïve bayes classification , 2012, IEEE-International Conference On Advances In Engineering, Science And Management (ICAESM -2012).

[19]  Ke Zhang,et al.  A New Local Distance-Based Outlier Detection Approach for Scattered Real-World Data , 2009, PAKDD.

[20]  Doo-Hwan Bae,et al.  An Approach to Outlier Detection of Software Measurement Data using the K-means Clustering Method , 2007, First International Symposium on Empirical Software Engineering and Measurement (ESEM 2007).