Weight-based method for inside outlier detection

Abstract Outlier detection becomes more and more important in our real life, such as network intrusion detection and credit card fraud detection, etc. In this paper, a weight-based method is proposed for inside outlier detection. According to the concepts of density and volume information, the weight is defined and introduced to construct a new measure of outlier-ness. Firstly, the total weight of a given object p and its neighbors is computed via their volume and average density. Then the estimated weight of the neighbors is obtained via the neighborhood's volume and p 's density. If the total weight is not close to the estimated weight, p is an outlier. The weight-based method shows more superiority in inside outlier detection than LOF in low dimensions. Moreover, the proposed method performs as well as LOD in a high-dimensional space or when no inside outlier exists.

[1]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[2]  Hui Wang,et al.  GLOF: a new approach for mining local outlier , 2003, Proceedings of the 2003 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.03EX693).

[3]  Jingke Xi,et al.  Outlier Detection Algorithms in Data Mining , 2008, 2008 Second International Symposium on Intelligent Information Technology Application.

[4]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD 2000.

[5]  William Perrizo,et al.  RDF: a density-based outlier detection method using vertical data representation , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[6]  Rajeev Rastogi,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD 2000.

[7]  Hans-Peter Kriegel,et al.  Angle-based outlier detection in high-dimensional data , 2008, KDD.

[8]  Yumin Chen,et al.  Neighborhood outlier detection , 2010, Expert Syst. Appl..

[9]  Hans-Peter Kriegel,et al.  A survey on unsupervised outlier detection in high‐dimensional numerical data , 2012, Stat. Anal. Data Min..

[10]  Charu C. Aggarwal Probabilistic and Statistical Models for Outlier Detection , 2013 .

[11]  Yixin Chen,et al.  Outlier Detection with the Kernelized Spatial Depth Function , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Alexandros Nanopoulos,et al.  Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection , 2015, IEEE Transactions on Knowledge and Data Engineering.

[13]  Ke Zhang,et al.  A New Local Distance-Based Outlier Detection Approach for Scattered Real-World Data , 2009, PAKDD.

[14]  Osmar R. Zaïane,et al.  Resolution-based outlier factor: detecting the top-n most outlying data points in engineering data , 2008, Knowledge and Information Systems.

[15]  Li Tu,et al.  Density-based clustering for real-time stream data , 2007, KDD '07.

[16]  Raymond T. Ng,et al.  Finding Intensional Knowledge of Distance-Based Outliers , 1999, VLDB.

[17]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[18]  Edward Hung,et al.  Mining Outliers with Faster Cutoff Update and Space Utilization , 2009, PAKDD.

[19]  Felix Naumann,et al.  Data fusion , 2009, CSUR.

[20]  Hongjun Lu,et al.  Finding centric local outliers in categorical/numerical spaces , 2006, Knowledge and Information Systems.

[21]  Yong Shi,et al.  COID: A cluster–outlier iterative detection approach to multi-dimensional data analysis , 2011, Knowledge and Information Systems.

[22]  Ying Liu,et al.  Cluster-based outlier detection , 2009, Ann. Oper. Res..

[23]  Raymond T. Ng,et al.  Distance-based outliers: algorithms and applications , 2000, The VLDB Journal.

[24]  Rasmus Pagh,et al.  A near-linear time approximation algorithm for angle-based outlier detection in high-dimensional data , 2012, KDD.

[25]  Yun Fu,et al.  Outlier detection via sampling ensemble , 2016, 2016 IEEE International Conference on Big Data (Big Data).