Automatic PAM Clustering Algorithm for Outlier Detection

In the past decade there has been intensive research on clustering algorithms for outlier detection, which has the advantage of simple modeling and effectiveness. In this paper, we propose an automatic k-means algorithm for outlier detection. The proposed methodology comprises two phases, clustering and finding outlying score. During clustering phase we automatically determine the number of clusters by combining k-means clustering algorithm and a specific cluster validation metric, which is vital to find a clustering solution that best fits the given data set, especially for k-means clustering algorithm. During finding outlier scores phase we decide outlying score of data instance corresponding to the cluster structure. Experiments on different datasets show that the proposed algorithm has higher detection rate go with lower false alarm rate comparing with the state of art outlier detection techniques, and it can be an effective solution for detecting outliers.

[1]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[2]  Vic Barnett,et al.  Outliers in Statistical Data , 1980 .

[3]  Mehmed Kantardzic,et al.  Data Mining: Concepts, Models, Methods, and Algorithms , 2002 .

[4]  Kanishka Bhaduri,et al.  Privacy-Preserving Outlier Detection Through Random Nonlinear Data Distortion , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[5]  Graham J. Williams,et al.  On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms , 2000, KDD '00.

[6]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[7]  Su Yang,et al.  LDBOD: A novel local distribution based outlier detector , 2008, Pattern Recognit. Lett..

[8]  Douglas M. Hawkins Identification of Outliers , 1980, Monographs on Applied Probability and Statistics.

[9]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[10]  Biao Huang,et al.  An Outlier Detection Algorithm Based on Spectral Clustering , 2008, 2008 IEEE Pacific-Asia Workshop on Computational Intelligence and Industrial Application.

[11]  Takafumi Kanamori,et al.  Statistical outlier detection using direct density ratio estimation , 2011, Knowledge and Information Systems.

[12]  P. Rousseeuw,et al.  Computing depth contours of bivariate point clouds , 1996 .

[13]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[14]  Liuqing Peng,et al.  CVAP: Validation for Cluster Analyses , 2009, Data Sci. J..

[15]  Michael Q. Zhang,et al.  Evaluation and comparison of clustering algorithms in analyzing es cell gene expression data , 2002 .

[16]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[17]  Fernando Velasco-Tapia,et al.  Evaluación estadística de Materiales de Referencia Geoquímica del Centre de Recherches Pétrographiques et Géochimiques (Francia) aplicando un esquema de detección y eliminación de valores desviados , 2009 .

[18]  Maurice K. Wong,et al.  Algorithm AS136: A k-means clustering algorithm. , 1979 .

[19]  Shian-Shyong Tseng,et al.  Two-phase clustering process for outliers detection , 2001, Pattern Recognit. Lett..

[20]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[21]  Anne Lohrli Chapman and Hall , 1985 .

[22]  Alessio Farcomeni,et al.  Error rates for multivariate outlier detection , 2011, Comput. Stat. Data Anal..

[23]  Zengyou He,et al.  Discovering cluster-based local outliers , 2003, Pattern Recognit. Lett..

[24]  Jugal K. Kalita,et al.  A Survey of Outlier Detection Methods in Network Anomaly Identification , 2011, Comput. J..

[25]  M.M. Deris,et al.  A Comparative Study for Outlier Detection Techniques in Data Mining , 2006, 2006 IEEE Conference on Cybernetics and Intelligent Systems.

[26]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[27]  Kenji Yamanishi,et al.  Discovering outlier filtering rules from unlabeled data: combining a supervised learner with an unsupervised learner , 2001, KDD '01.

[28]  Jian Tang,et al.  Enhancing Effectiveness of Outlier Detections for Low Density Patterns , 2002, PAKDD.

[29]  Qingsheng Zhu,et al.  Finding key attribute subset in dataset for outlier detection , 2011, Knowl. Based Syst..