An Effective Approach to Outlier Detection Based on Centrality and Centre-Proximity

In data mining research, outliers usually represent extreme values that deviate from other observations on data. The significant issue of existing outlier detection methods is that they only consider the object itself not taking its neighbouring objects into account to extract location features. In this paper, we propose an innovative approach to this issue. First, we propose the notions of centrality and centre-proximity for determining the degree of outlierness considering the distribution of all objects. We also propose a novel graph-based algorithm for outlier detection based on the notions. The algorithm solves the problems of existing methods, i.e. the problems of local density, micro-cluster, and fringe objects. We performed extensive experiments in order to confirm the effectiveness and efficiency of our proposed method. The obtained experimental results showed that the proposed method uncovers outliers successfully, and outperforms previous outlier detection methods.

[1]  John Yen,et al.  An incremental approach to building a cluster hierarchy , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[2]  Christian Böhm,et al.  CoCo: coding cost for parameter-free outlier detection , 2009, KDD.

[3]  Maurizio Filippone,et al.  A comparative evaluation of outlier detection algorithms: Experiments and analyses , 2018, Pattern Recognit..

[4]  Sang-Wook Kim,et al.  Analyzing a Korean blogosphere: a social network analysis perspective , 2011, SAC '11.

[5]  Christos Faloutsos,et al.  oddball: Spotting Anomalies in Weighted Graphs , 2010, PAKDD.

[6]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[7]  Hiroki Takakura,et al.  Toward a more practical unsupervised anomaly detection system , 2013, Inf. Sci..

[8]  Raymond T. Ng,et al.  Distance-based outliers: algorithms and applications , 2000, The VLDB Journal.

[9]  A. Madansky Identification of Outliers , 1988 .

[10]  Sang-Wook Kim,et al.  Outlier detection using centrality and center-proximity , 2012, CIKM '12.

[11]  Gerhard-Wilhelm Weber,et al.  A Hybrid Computational Method Based on Convex Optimization for Outlier Problems: Application to Earthquake Ground Motion Prediction , 2016, Informatica.

[12]  Abraham Kandel,et al.  Anomaly detection in web documents using crisp and fuzzy-based cosine clustering methodology , 2007, Inf. Sci..

[13]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[14]  Pang-Ning Tan,et al.  Outrank: a Graph-Based Outlier Detection Framework Using Random Walk , 2008, Int. J. Artif. Intell. Tools.

[15]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[16]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[17]  W. R. Buckland,et al.  Outliers in Statistical Data , 1979 .

[18]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[19]  Ji Feng,et al.  A non-parameter outlier detection algorithm based on Natural Neighbor , 2016, Knowl. Based Syst..

[20]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[21]  Christos Faloutsos,et al.  LOCI: fast outlier detection using the local correlation integral , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[22]  Christian S. Jensen,et al.  Outlier Detection for Multidimensional Time Series Using Deep Neural Networks , 2018, 2018 19th IEEE International Conference on Mobile Data Management (MDM).

[23]  Hadi Fanaee-T,et al.  Tensor-based anomaly detection: An interdisciplinary survey , 2016, Knowl. Based Syst..

[24]  Stephen D. Bay,et al.  Mining distance-based outliers in near linear time with randomization and a simple pruning rule , 2003, KDD '03.

[25]  Raymond T. Ng,et al.  Finding Intensional Knowledge of Distance-Based Outliers , 1999, VLDB.

[26]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[27]  Kit Yan Chan,et al.  Modeling manufacturing processes using a genetic programming-based fuzzy regression with detection of outliers , 2010, Inf. Sci..

[28]  Hwanjo Yu,et al.  DILOF: Effective and Memory Efficient Local Outlier Detection in Data Streams , 2018, KDD.