A Novel Approach to statistical comparison of meta-heuristic stochastic optimization algorithms using deep statistics

Abstract In this paper a novel approach for making a statistical comparison of meta-heuristic stochastic optimization algorithms over multiple single-objective problems is introduced, where a new ranking scheme is proposed to obtain data for multiple problems. The main contribution of this approach is that the ranking scheme is based on the whole distribution, instead of using only one statistic to describe the distribution, such as average or median. Averages are sensitive to outliers (i.e., the poor runs of the stochastic optimization algorithms) and consequently medians are sometimes used. However, using the common approach with either averages or medians, the results can be affected by the ranking scheme that is used by some standard statistical tests. This happens when the differences between the averages or medians are in some ϵ-neighborhood and the algorithms obtain different ranks though they should be ranked equally given the small differences that exist between them. The experimental results obtained on Black-Box Benchmarking 2015, show that our approach gives more robust results compared to the common approach in cases when the results are affected by outliers or by a misleading ranking scheme.

[1]  Ali R. Yildiz,et al.  A new hybrid differential evolution algorithm for the selection of optimal machining parameters in milling operations , 2013, Appl. Soft Comput..

[2]  O. J. Dunn Multiple Comparisons among Means , 1961 .

[3]  Francisco Herrera,et al.  Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power , 2010, Inf. Sci..

[4]  Francisco Herrera,et al.  A study on the use of non-parametric tests for analyzing the evolutionary algorithms’ behaviour: a case study on the CEC’2005 Special Session on Real Parameter Optimization , 2009, J. Heuristics.

[5]  G. Glass Testing Homogeneity of Variances , 1966 .

[6]  E. Lehmann Testing Statistical Hypotheses , 1960 .

[7]  P. Lachenbruch,et al.  Paired t Test , 2008 .

[8]  H. Lilliefors On the Kolmogorov-Smirnov Test for Normality with Mean and Variance Unknown , 1967 .

[9]  David J. Sheskin,et al.  Handbook of Parametric and Nonparametric Statistical Procedures , 1997 .

[10]  Sandrine Dudoit,et al.  Multiple Testing. Part II. Step-Down Procedures for Control of the Family-Wise Error Rate , 2004, Statistical applications in genetics and molecular biology.

[11]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[12]  Francisco Herrera,et al.  A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms , 2011, Swarm Evol. Comput..

[13]  Bernd Bischl,et al.  The Impact of Initial Designs on the Performance of MATSuMoTo on the Noiseless BBOB-2015 Testbed: A Preliminary Study , 2015, GECCO.

[14]  Francisco Herrera,et al.  Analyzing convergence performance of evolutionary algorithms: A statistical approach , 2014, Inf. Sci..

[15]  M. Nikolova A Variational Approach to Remove Outliers and Impulse Noise , 2004 .

[16]  Ali R. Yildiz,et al.  Comparison of evolutionary-based optimization algorithms for structural design optimization , 2013, Eng. Appl. Artif. Intell..

[17]  Ali Rıza Yıldız,et al.  A comparison of recent metaheuristic algorithms for crashworthiness optimisation of vehicle thin-walled tubes considering sheet metal forming effects , 2017 .

[18]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[19]  R. D'Agostino,et al.  A Suggestion for Using Powerful and Informative Tests of Normality , 1990 .

[20]  Marjan Mernik,et al.  A chess rating system for evolutionary algorithms: A new method for the comparison and ranking of evolutionary algorithms , 2014, Inf. Sci..

[21]  J. Gill The Insignificance of Null Hypothesis Significance Testing , 1999 .

[22]  Ali Kemal Çelik,et al.  A Monte Carlo simulation study for Kolmogorov-Smirnov two-sample test under the precondition of heterogeneity : upon the changes on the probabilities of statistical power and type I error rates with respect to skewness measure , 2013 .

[23]  G. Hommel A stagewise rejective multiple test procedure based on a modified Bonferroni test , 1988 .

[24]  G. Hommel,et al.  Improvements of General Multiple Test Procedures for Redundant Systems of Hypotheses , 1988 .

[25]  Asma Atamna,et al.  Benchmarking IPOP-CMA-ES-TPA and IPOP-CMA-ES-MSR on the BBOB Noiseless Testbed , 2015, GECCO.

[26]  T. Levine,et al.  A Critical Assessment of Null Hypothesis Significance Testing in Quantitative Communication Research , 2008 .

[27]  Betül Sultan Yıldız,et al.  A comparative investigation of eight recent population-based optimisation algorithms for mechanical and structural design problems , 2017 .

[28]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[29]  Charles E Lawrence,et al.  Mammalian Genomes Ease Location of Human DNA Functional Segments but Not Their Description , 2004, Statistical applications in genetics and molecular biology.

[30]  Ali R. Yildiz,et al.  A novel hybrid immune algorithm for global optimization in design and manufacturing , 2009 .

[31]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[32]  Martin Holena,et al.  Benchmarking Gaussian Processes and Random Forests Surrogate Models on the BBOB Noiseless Testbed , 2015, GECCO.

[33]  Marjan Mernik,et al.  On the influence of the number of algorithms, problems, and independent runs in the comparison of evolutionary algorithms , 2017, Appl. Soft Comput..

[34]  S. Shapiro,et al.  An Approximate Analysis of Variance Test for Normality , 1972 .

[35]  Anne Auger,et al.  Comparing results of 31 algorithms from the black-box optimization benchmarking BBOB-2009 , 2010, GECCO '10.

[36]  Petr Posík,et al.  Dimension Selection in Axis-Parallel Brent-STEP Method for Black-Box Optimization of Separable Continuous Functions , 2015, GECCO.

[37]  Borja Calvo,et al.  scmamp: Statistical Comparison of Multiple Algorithms in Multiple Problems , 2016, R J..

[38]  S. García,et al.  An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons , 2008 .

[39]  Morteza Kiani,et al.  A Comparative Study of Non-traditional Methods for Vehicle Crashworthiness and NVH Optimization , 2016 .

[40]  M. Rosenblatt A CENTRAL LIMIT THEOREM AND A STRONG MIXING CONDITION. , 1956, Proceedings of the National Academy of Sciences of the United States of America.

[41]  M. Longnecker,et al.  A modified Wilcoxon rank sum test for paired data , 1983 .

[42]  Vic Barnett,et al.  Outliers in Statistical Data , 1980 .

[43]  Y. Hochberg A sharper Bonferroni procedure for multiple tests of significance , 1988 .