RKOF: Robust Kernel-Based Local Outlier Detection

Outlier detection is an important and attractive problem in knowledge discovery in large data sets. The majority of the recent work in outlier detection follow the framework of Local Outlier Factor (LOF), which is based on the density estimate theory. However, LOF has two disadvantages that restrict its performance in outlier detection. First, the local density estimate of LOF is not accurate enough to detect outliers in the complex and large databases. Second, the performance of LOF depends on the parameter k that determines the scale of the local neighborhood. Our approach adopts the variable kernel density estimate to address the first disadvantage and the weighted neighborhood density estimate to improve the robustness to the variations of the parameter k, while keeping the same framework with LOF. Besides, we propose a novel kernel function named the Volcano kernel, which is more suitable for outlier detection. Experiments on several synthetic and real data sets demonstrate that our approach not only substantially increases the detection performance, but also is relatively scalable in large data sets in comparison to the state-of-the-art outlier detection methods.

[1]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[2]  Jun Gao,et al.  Local Outlier Detection Based on Kernel Regression , 2010, 2010 20th International Conference on Pattern Recognition.

[3]  Vipin Kumar,et al.  Feature bagging for outlier detection , 2005, KDD '05.

[4]  Jian Tang,et al.  Enhancing Effectiveness of Outlier Detections for Low Density Patterns , 2002, PAKDD.

[5]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[6]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[7]  Peter J. Rousseeuw,et al.  Robust Regression and Outlier Detection , 2005, Wiley Series in Probability and Statistics.

[8]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[9]  Christos Faloutsos,et al.  LOCI: fast outlier detection using the local correlation integral , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[10]  Sanjay Chawla,et al.  On local spatial outliers , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[11]  Anthony K. H. Tung,et al.  Mining top-n local outliers in large databases , 2001, KDD '01.

[12]  C. D. Kemp,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[13]  Douglas M. Hawkins Identification of Outliers , 1980, Monographs on Applied Probability and Statistics.

[14]  Aleksandar Lazarevic,et al.  Outlier Detection with Kernel Density Functions , 2007, MLDM.

[15]  Yiyu Yao,et al.  Local peculiarity factor and its application in outlier detection , 2008, KDD.

[16]  Bianca Zadrozny,et al.  Outlier detection by active learning , 2006, KDD '06.

[17]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.