Closest neighbors excluded outlier detection

Traditional distance-based outlier detection usually consider the distances from a point to its nearest neighbors as its outlier degree. In this case, if a few points form a small but dense cluster, which is far from other points, points in this small and dense cluster are not likely to be detected as outliers. In this paper, we propose a new distance-based outlier definition, Closest Neighbors Excluded (CNE) outlier, and the corresponding detection algorithm, which is able to detect dense outliers, as well as sparse outliers. Experimental results show that the CNE algorithm achieves great improvement in accuracy with little cost of efficiency.

[1]  Patrick Valduriez,et al.  Proceedings of the 2004 ACM SIGMOD international conference on Management of data , 2004, SIGMOD 2004.

[2]  David A. Clifton,et al.  A review of novelty detection , 2014, Signal Process..

[3]  Na Wang,et al.  Research on Credit Card Fraud Detection Model Based on Distance Sum , 2009, 2009 International Joint Conference on Artificial Intelligence.

[4]  Lei Cao,et al.  Scalable distance-based outlier detection over high-volume data streams , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[5]  Raymond T. Ng,et al.  Distance-based outliers: algorithms and applications , 2000, The VLDB Journal.

[6]  Yufei Tao,et al.  Mining distance-based outliers from large databases in any metric space , 2006, KDD '06.

[7]  Kanishka Bhaduri,et al.  Algorithms for speeding up distance-based outlier detection , 2011, KDD.

[8]  William Perrizo,et al.  A vertical distance-based outlier detection method with local pruning , 2004, CIKM '04.

[9]  Alexander S. Szalay,et al.  Very Fast Outlier Detection in Large Multidimensional Data Sets , 2002, DMKD.

[10]  Loredana Ureche-Rangau,et al.  Robust outlier detection for Asia–Pacific stock index returns , 2008 .

[11]  Shruti Aggarwal,et al.  Survey on Outlier Detection in Data Mining , 2013 .

[12]  Ricardo A. Baeza-Yates,et al.  Searching in metric spaces , 2001, CSUR.

[13]  Srinivasan Parthasarathy,et al.  Distance-based outlier detection , 2010, Proc. VLDB Endow..

[14]  Volume Ps EEE TRANSACTIONS ON , 1986 .

[15]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[16]  Jianqiang Li,et al.  Overcoming the challenge of variety: big data abstraction, the next evolution of data management for AAL communication systems , 2015, IEEE Communications Magazine.

[17]  Clara Pizzuti,et al.  Outlier mining in large high-dimensional data sets , 2005, IEEE Transactions on Knowledge and Data Engineering.