Particle swarm optimisation for outlier detection

Outlier detection is an important problem as the underlying data points often contain crucial information, but identifying such points has multiple challenges, e.g. noisy data, imprecise boundaries and lack of training examples. In the novel approach presented in this paper, the outlier detection problem is converted into an optimisation problem. A Particle Swarm Optimisation (PSO) based approach to outlier detection is then applied, which expands the scope of PSO and enables new insights into outlier detection. Namely, PSO is used to automatically optimise the key distance measures instead of manually setting the distance parameters via trial and error, which is inefficient and often ineffective. The novel PSO approach is examined and compared with a commonly used detection method, Local Outlier Factor (LOF), on five real data sets. The results show that the new PSO method significantly outperforms the LOF methods for correctly detecting the outliers on the majority of the datasets and that the new PSO method is more efficient than the LOF method on the datasets tested.

[1]  Douglas M. Hawkins Identification of Outliers , 1980, Monographs on Applied Probability and Statistics.

[2]  Hans-Peter Kriegel,et al.  Angle-based outlier detection in high-dimensional data , 2008, KDD.

[3]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[4]  Dongyi Ye,et al.  A New Algorithm for High-Dimensional Outlier Detection Based on Constrained Particle Swarm Intelligence , 2008, RSKT.

[5]  C. Ezeife,et al.  LSC-Mine: Algorithm for Mining Local Outliers , 2004 .

[6]  Hongxing He,et al.  Outlier Detection Using Replicator Neural Networks , 2002, DaWaK.

[7]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[8]  M. Clerc,et al.  The swarm and the queen: towards a deterministic and adaptive particle swarm optimization , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[9]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[10]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[11]  Ke Zhang,et al.  A New Local Distance-Based Outlier Detection Approach for Scattered Real-World Data , 2009, PAKDD.

[12]  James Kennedy,et al.  Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.

[13]  Christos Faloutsos,et al.  LOCI: fast outlier detection using the local correlation integral , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[14]  Philip S. Yu,et al.  Outlier detection for high dimensional data , 2001, SIGMOD '01.

[15]  Raymond T. Ng,et al.  Distance-based outliers: algorithms and applications , 2000, The VLDB Journal.

[16]  Osmar R. Zaïane,et al.  An Efficient Reference-Based Approach to Outlier Detection in Large Datasets , 2006, Sixth International Conference on Data Mining (ICDM'06).

[17]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[18]  Roger L. Wainwright,et al.  Applying Genetic Algorithms to Outlier Detection , 1995, ICGA.

[19]  Yuan Li,et al.  DB-Outlier Detection by Example in High Dimensional Datasets , 2007, 2007 IEEE International Workshop on Databases for Next Generation Researchers.