A novel outlier cluster detection algorithm without top-n parameter

Outlier detection is an important task in data mining with numerous applications, including credit card fraud detection, video surveillance, etc. Outlier detection has been widely focused and studied in recent years. The concept about outlier factor of object is extended to the case of cluster. Although many outlier detection algorithms have been proposed, most of them face the top-n problem, i.e., it is difficult to know how many points in a database are outliers. In this paper we propose a novel outlier cluster detection algorithm called ROCF based on the concept of mutual neighbor graph and on the idea that the size of outlier clusters is usually much smaller than the normal clusters. ROCF can automatically figure out the outlier rate of a database and effectively detect the outliers and outlier clusters without top-n parameter. The formal analysis and experiments show that this method can achieve good performance in outlier detection.

[1]  Hongyeon Kim,et al.  An Efficient Outlier Detection Algorithms based on Data Clustering over Massive Data , 2015 .

[2]  Anthony K. H. Tung,et al.  Ranking Outliers Using Symmetric Neighborhood Relationship , 2006, PAKDD.

[3]  Raymond T. Ng,et al.  A Unified Notion of Outliers: Properties and Computation , 1997, KDD.

[4]  Zengyou He,et al.  Discovering cluster-based local outliers , 2003, Pattern Recognit. Lett..

[5]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[6]  Zhong Ping Zhang,et al.  A Data Stream Outlier Detection Algorithm Based on Reverse K Nearest Neighbors , 2011 .

[7]  Jong-Seok Lee,et al.  Robust outlier detection using the instability factor , 2014, Knowl. Based Syst..

[8]  Aidong Zhang,et al.  FindOut: Finding Outliers in Very Large Datasets , 2002, Knowledge and Information Systems.

[9]  Zou Xian-lin Clustering Algorithm of Outliers Based on Adjacency Graph , 2008 .

[10]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD 2000.

[11]  Lida Xu,et al.  A local-density based spatial clustering algorithm with noise , 2007, Inf. Syst..

[12]  Ji Feng,et al.  A non-parameter outlier detection algorithm based on Natural Neighbor , 2016, Knowl. Based Syst..

[13]  Vic Barnett,et al.  Outliers in Statistical Data , 1980 .

[14]  P. Rousseeuw,et al.  Computing depth contours of bivariate point clouds , 1996 .

[15]  Raymond T. Ng,et al.  Distance-based outliers: algorithms and applications , 2000, The VLDB Journal.

[16]  M. R. Brito,et al.  Connectivity of the mutual k-nearest-neighbor graph in clustering and outlier detection , 1997 .

[17]  Ying Liu,et al.  Cluster-based outlier detection , 2009, Ann. Oper. Res..

[18]  Anthony K. H. Tung,et al.  Mining top-n local outliers in large databases , 2001, KDD '01.

[19]  Douglas M. Hawkins Identification of Outliers , 1980, Monographs on Applied Probability and Statistics.

[20]  Michael Pokojovy,et al.  A Cluster-Based Outlier Detection Scheme for Multivariate Data , 2015 .

[21]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[22]  Theodore Johnson,et al.  Fast Computation of 2-Dimensional Depth Contours , 1998, KDD.

[23]  Shian-Shyong Tseng,et al.  Two-phase clustering process for outliers detection , 2001, Pattern Recognit. Lett..