Efficient Density Based Outlier Handling Technique in Data Mining

Local Outlier Factor (LOF) is an important and well known density based outlier handling algorithm, which quantifies, how much, an object is outlying, in a given database. In this paper, first we discuss LOF and its variants (LOF’ and LOF”) and then we propose an efficient density based outlier handling algorithm, which is inspired by LOF and LOF’. This algorithm not only focuses on the density-based notion to discover local outliers but also reduces the number of passes to scan the complete database. This method calculates the MinPts-dist variance for every object. If MinPts-dist variance of an object is greater than a specified threshold value than that object is considered as an outlier. The experimental results show that the proposed outlier handling algorithm detects outliers more effectively.

[1]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[2]  Anthony K. H. Tung,et al.  Mining top-n local outliers in large databases , 2001, KDD '01.

[3]  Hans-Peter Kriegel,et al.  Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications , 1998, Data Mining and Knowledge Discovery.

[4]  Philip S. Yu,et al.  An effective and efficient algorithm for high-dimensional outlier detection , 2005, The VLDB Journal.

[5]  Jiawei Han,et al.  Efficient and Effective Clustering Methods for Spatial Data Mining , 1994, VLDB.

[6]  Douglas M. Hawkins Identification of Outliers , 1980, Monographs on Applied Probability and Statistics.

[7]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[8]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[9]  Ada Wai-Chee Fu,et al.  Enhancements on local outlier detection , 2003, Seventh International Database Engineering and Applications Symposium, 2003. Proceedings..

[10]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[11]  Philip S. Yu,et al.  Fast algorithms for projected clustering , 1999, SIGMOD '99.

[12]  P. Rousseeuw,et al.  Computing depth contours of bivariate point clouds , 1996 .

[13]  Raymond T. Ng,et al.  Finding Intensional Knowledge of Distance-Based Outliers , 1999, VLDB.

[14]  Clara Pizzuti,et al.  Outlier mining in large high-dimensional data sets , 2005, IEEE Transactions on Knowledge and Data Engineering.

[15]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD 2000.