Parameter k search strategy in outlier detection

Abstract The selection for parameter k(the number of nearest neighbors) is an important problem in the field of outlier detection. If k selected is too small, outlier clusters may not be detected. On the contrary, normal points may be detected as outliers. In order to solve the parameter selection problem, recent studies select k by searching for a natural or stable relative neighborhood. However, these studies intuitively chose k, and haven’t explained why the k is appropriate. In this paper, we have analyzed the above questions and presented a mutual neighbor graph(MNG) based parameter k searching algorithm. Furthermore, we proved the chosen k is appropriate from three angles. Experiments on synthetic and real data sets demonstrate that the proposed method achieves better performance than other alternatives.

[1]  Qi Tian,et al.  Nearest-neighbor method using multiple neighborhood similarities for social media data mining , 2012, Neurocomputing.

[2]  Stephen D. Bay,et al.  Mining distance-based outliers in near linear time with randomization and a simple pruning rule , 2003, KDD '03.

[3]  Charles X. Ling,et al.  Using AUC and accuracy in evaluating learning algorithms , 2005, IEEE Transactions on Knowledge and Data Engineering.

[4]  Anthony K. H. Tung,et al.  Ranking Outliers Using Symmetric Neighborhood Relationship , 2006, PAKDD.

[5]  Sylvie Ratté,et al.  Bagged Subspaces for Unsupervised Outlier Detection , 2017, Comput. Intell..

[6]  Charu C. Aggarwal,et al.  Outlier ensembles: position paper , 2013, SKDD.

[7]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD 2000.

[8]  Ji Feng,et al.  A non-parameter outlier detection algorithm based on Natural Neighbor , 2016, Knowl. Based Syst..

[9]  Ji Feng,et al.  Natural neighbor: A self-adaptive neighborhood method without parameter K , 2016, Pattern Recognit. Lett..

[10]  Yannis Manolopoulos,et al.  Efficient and flexible algorithms for monitoring distance-based outliers over data streams , 2016, Inf. Syst..

[11]  Jong-Seok Lee,et al.  A precise ranking method for outlier detection , 2015, Inf. Sci..

[12]  Charu C. Aggarwal,et al.  Outlier Analysis , 2013, Springer New York.

[13]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[14]  Jong-Seok Lee,et al.  Robust outlier detection using the instability factor , 2014, Knowl. Based Syst..

[15]  M. R. Brito,et al.  Connectivity of the mutual k-nearest-neighbor graph in clustering and outlier detection , 1997 .

[16]  Qingsheng Zhu,et al.  A novel outlier cluster detection algorithm without top-n parameter , 2017, Knowl. Based Syst..

[17]  Qingsheng Zhu,et al.  Adaptive edited natural neighbor algorithm , 2017, Neurocomputing.

[18]  Ji Feng,et al.  Weighted natural neighborhood graph: an adaptive structure for clustering and outlier detection with no neighborhood parameter , 2016, Cluster Computing.

[19]  Md. Rafiqul Islam,et al.  A survey of anomaly detection techniques in financial domain , 2016, Future Gener. Comput. Syst..

[20]  Bo Tang,et al.  A Local Density-Based Approach for Local Outlier Detection , 2016, ArXiv.

[21]  D. West Introduction to Graph Theory , 1995 .

[22]  Anil K. Ghosh,et al.  On optimum choice of k , 2006, Comput. Stat. Data Anal..

[23]  Gautam Bhattacharya,et al.  Outlier detection using neighborhood rank difference , 2015, Pattern Recognit. Lett..

[24]  Ke Zhang,et al.  A New Local Distance-Based Outlier Detection Approach for Scattered Real-World Data , 2009, PAKDD.

[25]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[26]  Zhang Zhongping,et al.  A Data Stream Outlier Delection Algorithm Based on Reverse K Nearest Neighbors , 2010, 2010 International Symposium on Computational Intelligence and Design.

[27]  Hans-Peter Kriegel,et al.  Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection , 2012, Data Mining and Knowledge Discovery.

[28]  Hans-Peter Kriegel,et al.  LoOP: local outlier probabilities , 2009, CIKM.

[29]  Aleksandar Lazarevic,et al.  Outlier Detection with Kernel Density Functions , 2007, MLDM.