Improved Hybrid Clustering and Distance-based Technique for Outlier Removal

Outliers detection is a task that finds objects that are dissimilar or inconsistent with respect to the remaining data. It has many uses in applications like fraud detection, network intrusion detection and clinical diagnosis of diseases. Using clustering algorithms for outlier detection is a technique that is frequently used. The clustering algorithms consider outlier detection only to the point they do not interfere with the clustering process. In these algorithms, outliers are only by-products of clustering algorithms and they cannot rank the priority of outliers. In this paper, three partition-based algorithms, PAM, CLARA and CLARANs are combined with k-medoid distance based outlier detection to improve the outlier detection and removal process. The experimental results prove that CLARANS clustering algorithm when combined with medoid distance based outlier detection improves the accuracy of detection and increases the time efficiency.

[1]  Peter J. Rousseeuw,et al.  Clustering by means of medoids , 1987 .

[2]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[3]  B. Cutsem,et al.  Detection of outliers and robust estimation using fuzzy clustering , 1993 .

[4]  Jiawei Han,et al.  Efficient and Effective Clustering Methods for Spatial Data Mining , 1994, VLDB.

[5]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[6]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[7]  Jiong Yang,et al.  STING: A Statistical Information Grid Approach to Spatial Data Mining , 1997, VLDB.

[8]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[9]  Shian-Shyong Tseng,et al.  Two-phase clustering process for outliers detection , 2001, Pattern Recognit. Lett..

[10]  Hongxing He,et al.  A comparative study of RNN for outlier detection in data mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[11]  Jiawei Han,et al.  CLARANS: A Method for Clustering Objects for Spatial Data Mining , 2002, IEEE Trans. Knowl. Data Eng..

[12]  Wen-Xiu Zhang,et al.  A knowledge processing method for intelligent systems based on inclusion degree , 2003, Expert Syst. J. Knowl. Eng..

[13]  Mei Zhang,et al.  A rough set approach to knowledge reduction based on inclusion degree and evidence reasoning theory , 2003, Expert Syst. J. Knowl. Eng..

[14]  Carlos Soares,et al.  Outlier Detection using Clustering Methods: a data cleaning application , 2004 .

[15]  Herna L. Viktor,et al.  Exploring Anthropometric Data through Cluster Analysis , 2004 .

[16]  Wei Jiang,et al.  On-line outlier detection and data cleaning , 2004, Comput. Chem. Eng..

[17]  E. Acuña,et al.  A Meta analysis study of outlier detection methods in classification , 2004 .

[18]  Li Xu Advances in intelligent information processing , 2006, Expert Syst. J. Knowl. Eng..

[19]  Helder Gomes Costa,et al.  Application of an integrated decision support process for supplier selection , 2007, Enterp. Inf. Syst..

[20]  Sebastião J. Formosinho,et al.  Improving hierarchical cluster analysis: A new method with outlier detection and automatic clustering , 2007 .

[21]  Sheng-yi Jiang,et al.  Clustering-Based Outlier Detection Method , 2008, 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery.

[22]  Lida Xu,et al.  An Integrated Approach for Agricultural Ecosystem Management , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[23]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[24]  Mohamed A. Ismail,et al.  Fuzzy Relatives of the CLARANS Algorithm With Application to Text Clustering , 2009 .

[25]  Fabrizio Angiulli,et al.  Outlier Detection Techniques for Data Mining , 2009, Encyclopedia of Data Warehousing and Mining.

[26]  Amitava Karmaker,et al.  Outlier Detection in Spatial Databases Using Clustering Data Mining , 2009, 2009 Sixth International Conference on Information Technology: New Generations.

[27]  Moh'd Belal Al-Zoubi,et al.  New outlier detection method based on fuzzy clustering , 2010 .

[28]  T. Velmurugan,et al.  A Survey of Partition based Clustering Algorithms in Data Mining: An Experimental Approach , 2011 .