Outlier Detection using Evolutionary Computing

In this paper, we proposed Harmony Search and Differential Evolution based outlier detection for medium dimensional numerical datasets. The sparsity coefficient is taken as the objective function for finding outliers in the data. The upper limit of the number of dimensions for a dataset is fixed using the threshold suggested by Chebyshev's inequality. A t-test is conducted on the optimal sparsity coefficient for both methods over 30 simulations. At a 1% level of significance, the t-test confirmed that the Harmony Search based method is statistically more significant than Differential Evolution based one for all four datasets. Both methods outperformed the previous approaches.

[1]  I. Miller Probability, Random Variables, and Stochastic Processes , 1966 .

[2]  Vadlamani Ravi,et al.  Modified Harmony Search Applied to Reliability Optimization of Complex Systems , 2015, ICHSA.

[3]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[4]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[5]  Aleksandar Lazarevic,et al.  Outlier Detection with Kernel Density Functions , 2007, MLDM.

[6]  Azuraliza Abu Bakar,et al.  A Rough set outlier detection based on Particle Swarm Optimization , 2010, 2010 10th International Conference on Intelligent Systems Design and Applications.

[7]  Zengyou He,et al.  An Optimization Model for Outlier Detection in Categorical Data , 2005, ICIC.

[8]  Mengjie Zhang,et al.  Particle swarm optimisation for outlier detection , 2010, GECCO '10.

[9]  Roger L. Wainwright,et al.  Applying Genetic Algorithms to Outlier Detection , 1995, ICGA.

[10]  R. Shiffler Maximum Z Scores and Outliers , 1988 .

[11]  J. Tolvi,et al.  Genetic algorithms for outlier detection and variable selection in linear regression models , 2004, Soft Comput..

[12]  Hans-Peter Kriegel,et al.  A survey on unsupervised outlier detection in high‐dimensional numerical data , 2012, Stat. Anal. Data Min..

[13]  M. H. Marghny Ahmed I. Taloba,et al.  Outlier Detection using Improved Genetic K-means , 2011, ArXiv.

[14]  John G. Proakis,et al.  Probability, random variables and stochastic processes , 1985, IEEE Trans. Acoust. Speech Signal Process..

[15]  Zong Woo Geem,et al.  A New Heuristic Optimization Algorithm: Harmony Search , 2001, Simul..

[16]  Y. B. Wah,et al.  Power comparisons of Shapiro-Wilk , Kolmogorov-Smirnov , Lilliefors and Anderson-Darling tests , 2011 .

[17]  Philip S. Yu,et al.  Outlier detection for high dimensional data , 2001, SIGMOD '01.

[18]  José A. Villaseñor Alva,et al.  A Generalization of Shapiro–Wilk's Test for Multivariate Normality , 2009 .

[19]  R. Storn,et al.  Differential Evolution: A Practical Approach to Global Optimization (Natural Computing Series) , 2005 .

[20]  Kellie B. Keeling,et al.  Concise Managerial Statistics , 2005 .

[21]  Petra Perner,et al.  Machine Learning and Data Mining in Pattern Recognition , 2009, Lecture Notes in Computer Science.

[22]  Behzad Moshiri,et al.  Anomaly detection using a self-organizing map and particle swarm optimization , 2011, Sci. Iran..