The impact of statistics for benchmarking in evolutionary computation research

Benchmarking theory in evolutionary computation research is a crucial task that should be properly applied in order to evaluate the performance of a newly introduced evolutionary algorithm with performance of state-of-the-art algorithms. Benchmarking theory is related to three main questions: which problems to choose, how to setup experiments, and how to evaluate performance. In this paper, we evaluate the impact of different already established statistical ranking schemes that can be used for evaluation of performance in benchmarking practice for evolutionary computation. Experimental results obtained on Black-Box Benchmarking 2015 showed that different statistical ranking schemes, used on the same benchmarking data, can lead to different benchmarking results. For this reason, we examined the merits and issues of each of them regarding benchmarking practices.

[1]  Asma Atamna,et al.  Benchmarking IPOP-CMA-ES-TPA and IPOP-CMA-ES-MSR on the BBOB Noiseless Testbed , 2015, GECCO.

[2]  Michael Wolf,et al.  Multiple Testing , 2009 .

[3]  Francisco Herrera,et al.  A study on the use of non-parametric tests for analyzing the evolutionary algorithms’ behaviour: a case study on the CEC’2005 Special Session on Real Parameter Optimization , 2009, J. Heuristics.

[4]  Raymond Ros,et al.  Real-Parameter Black-Box Optimization Benchmarking 2009: Experimental Setup , 2009 .

[5]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[6]  Martin Holena,et al.  Benchmarking Gaussian Processes and Random Forests Surrogate Models on the BBOB Noiseless Testbed , 2015, GECCO.

[7]  Sandrine Dudoit,et al.  Multiple Testing. Part II. Step-Down Procedures for Control of the Family-Wise Error Rate , 2004, Statistical applications in genetics and molecular biology.

[8]  Wayne W. Daniel,et al.  Biostatistics: A Foundation for Analysis in the Health Sciences , 1974 .

[9]  Heike Trautmann,et al.  Benchmarking Evolutionary Algorithms: Towards Exploratory Landscape Analysis , 2010, PPSN.

[10]  Bernd Bischl,et al.  Exploratory landscape analysis , 2011, GECCO '11.

[11]  Petr Posík,et al.  Dimension Selection in Axis-Parallel Brent-STEP Method for Black-Box Optimization of Separable Continuous Functions , 2015, GECCO.

[12]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[13]  Tome Eftimov,et al.  A Novel Approach to statistical comparison of meta-heuristic stochastic optimization algorithms using deep statistics , 2017, Inf. Sci..

[14]  Sonja Engmann Quantitative Methods Inquires 1 COMPARING DISTRIBUTIONS : THE TWO-SAMPLE ANDERSON-DARLING TEST AS AN ALTERNATIVE TO THE KOLMOGOROV-SMIRNOFF TEST , 2013 .

[15]  Francisco Herrera,et al.  A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms , 2011, Swarm Evol. Comput..

[16]  Bernd Bischl,et al.  The Impact of Initial Designs on the Performance of MATSuMoTo on the Noiseless BBOB-2015 Testbed: A Preliminary Study , 2015, GECCO.

[17]  Anne Auger,et al.  COCO: a platform for comparing continuous optimizers in a black-box setting , 2016, Optim. Methods Softw..