An Improved Overlapping Clustering Algorithm to Detect Outlier

MCOKE algorithm in identifying data objects to multi cluster is known for its simplicity and effectiveness. Its drawback is the use of maxdist as a global threshold in assigning objects to one or more cluster while it is sensitive to outliers. Having outliers in the datasets can significantly affect the effectiveness of maxdist as regards to overlapping clustering. In this paper, the outlier detection is incorporated in MCOKE algorithm so that it can detect and remove outliers that can participate in the calculation of assigning objects to one or more clusters. The improved MCOKE algorithm provides better identification of overlapping clustering results. The performance was evaluated via F1 score performance criterion. Evaluation results revealed that the outlier detection demonstrated higher accuracy rate in identifying abnormal data (outliers) when applied to real datasets .

[1]  Pankaj Kumar Sharma,et al.  Improving Classification by Outlier Detection and Removal , 2015 .

[2]  Kriangkrai Limthong,et al.  Real-Time Computer Network Anomaly Detection Using Machine Learning Techniques , 2013 .

[3]  Anuradha Pillai,et al.  Clustering in Aggregated User Profiles Across Multiple Social Networks , 2017 .

[4]  Mohamed Tahar Kimour,et al.  An Heterogeneous Population-Based Genetic Algorithm for Data Clustering , 2017 .

[5]  P. Rousseeuw,et al.  Alternatives to the Median Absolute Deviation , 1993 .

[6]  Bidyut Kr. Patra Using the triangle inequality to accelerate Density based Outlier Detection Method , 2012 .

[7]  Xingyi Zhang,et al.  Overlapping Community Detection based on Network Decomposition , 2016, Scientific Reports.

[8]  Huan Liu,et al.  Discovering Overlapping Groups in Social Media , 2010, 2010 IEEE International Conference on Data Mining.

[9]  Christophe Ley,et al.  Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median , 2013 .

[10]  Damien Lolive,et al.  Unsupervised Classification of Speaker Profiles as a Point Anomaly Detection Task , 2017, LIDTA@PKDD/ECML.

[11]  Muhammad Mubashir Khan,et al.  Anomaly detection through keystroke and tap dynamics implemented via machine learning algorithms , 2018 .

[12]  K. Senthamaraikannan,et al.  Identification of Outliers in Medical Diagnostic System Using Data Mining Techniques , 2014 .

[13]  Joan Lu,et al.  Overlapping clustering: A review , 2016, 2016 SAI Computing Conference (SAI).

[14]  Jeff Miller,et al.  Short Report: Reaction Time Analysis with Outlier Exclusion: Bias Varies with Sample Size , 1991, The Quarterly journal of experimental psychology. A, Human experimental psychology.

[15]  Said Baadel,et al.  MCOKE: Multi-Cluster Overlapping K-Means Extension Algorithm , 2015 .

[16]  Rupa G. Mehta,et al.  Impact of Outlier Removal and Normalization Approach in Modified k-Means Clustering Algorithm , 2011 .

[17]  Xuelong Li,et al.  Efficient Outlier Detection for High-Dimensional Data , 2018, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[18]  Margaret H. Dunham,et al.  A Comparative Study of Outlier Detection Algorithms , 2009, MLDM.

[19]  Mohammad Alaqtash,et al.  A Modified Overlapping Partitioning Clustering Algorithm for Categorical Data Clustering , 2018 .

[20]  S. Nithya,et al.  An Efficient Clustering Algorithm for , 2011 .

[21]  Sina Khanmohammadi,et al.  An improved overlapping k-means clustering method for medical applications , 2017, Expert Syst. Appl..

[22]  Aristides Gionis,et al.  k-means-: A Unified Approach to Clustering and Outlier Detection , 2013, SDM.

[23]  Seiichi Uchida,et al.  A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data , 2016, PloS one.

[24]  Tanmoy Chakraborty,et al.  OverCite: Finding overlapping communities in citation network , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[25]  G. Meera Gandhi,et al.  Cluster Based Outlier Detection Algorithm for Healthcare Data , 2015 .

[26]  Lopamudra Dey,et al.  Outlier Detection and Removal Algorithm in K-Means and Hierarchical Clustering , 2017 .

[27]  Ramin Javadi,et al.  Clustering and outlier detection using isoperimetric number of trees , 2013, Pattern Recognit..

[28]  Chiheb-Eddine Ben N'Cir,et al.  Identification of non-disjoint clusters with small and parameterizable overlaps , 2013, 2013 International Conference on Computer Applications Technology (ICCAT).