Bayesian performance analysis for black-box optimization benchmarking

The most commonly used statistics in Evolutionary Computation (EC) are of the Wilcoxon-Mann-Whitney-test type, in its either paired or non-paired version. However, using such statistics for drawing performance comparisons has several known drawbacks. At the same time, Bayesian inference for performance analysis is an emerging statistical tool, which has the potential to become a promising complement to the statistical perspectives offered by the aforementioned p-value type test. This work exhibits the practical use of Bayesian inference in a typical EC setting, where several algorithms are to be compared with respect to various performance indicators. Explicitly we examine performance data of 11 evolutionary algorithms (EAs) over a set of 23 discrete optimization problems in several dimensions. Using this data, and following a brief introduction to the relevant Bayesian inference practice, we demonstrate how to draw the algorithms' probabilities of winning. Apart from fixed-target and fixed-budget results for the individual problems, we also provide an illustrative example per groups of problems. We elaborate on the computational steps, explain the associated uncertainties, and articulate considerations such as the prior distribution and the sample sizing. We also present as a reference the classical p-value tests.

[1]  Anne Auger,et al.  Comparing results of 31 algorithms from the black-box optimization benchmarking BBOB-2009 , 2010, GECCO '10.

[2]  Marco Zaffalon,et al.  Time for a change: a tutorial for comparing multiple classifiers through Bayesian analysis , 2016, J. Mach. Learn. Res..

[3]  Ofer M. Shir,et al.  Benchmarking discrete optimization heuristics with IOHprofiler , 2019, GECCO.

[4]  Roberto Basili,et al.  KELP: a Kernel-based Learning Platform , 2018, J. Mach. Learn. Res..

[5]  Benjamin Doerr,et al.  From black-box complexity to designing new genetic algorithms , 2015, Theor. Comput. Sci..

[6]  R. Plackett The Analysis of Permutations , 1975 .

[7]  Benjamin Doerr,et al.  Fast genetic algorithms , 2017, GECCO.

[8]  Gopal Kanji,et al.  100 Statistical Tests , 1994 .

[9]  R. Duncan Luce,et al.  Individual Choice Behavior , 1959 .

[10]  Alessio Benavoli,et al.  Joint Analysis of Multiple Algorithms and Performance Measures , 2016, New Generation Computing.

[11]  Benjamin Doerr,et al.  The (1+λ) evolutionary algorithm with self-adjusting mutation rate , 2017, GECCO.

[12]  S. García,et al.  An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons , 2008 .

[13]  S. Goodman,et al.  Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations , 2016, European Journal of Epidemiology.

[14]  Thomas Bäck,et al.  Interpolating Local and Global Search by Controlling the Variance of Standard Bit Mutation , 2019, 2019 IEEE Congress on Evolutionary Computation (CEC).

[15]  Ekhiñe Irurozqui Arrieta,et al.  Sampling and learning distance-based probability models for permutation spaces , 2014 .

[16]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[17]  Patricia S. O Sullivan,et al.  100 Statistical Tests , 1995 .

[18]  Thomas Bäck,et al.  Intelligent Mutation Rate Control in Canonical Genetic Algorithms , 1996, ISMIS.

[19]  Benjamin Doerr,et al.  The ($$1+\lambda $$1+λ) Evolutionary Algorithm with Self-Adjusting Mutation Rate , 2018, Algorithmica.

[20]  Mark Wineberg Introductory Statistics for EC: A Visual Approach , 2016, GECCO.

[21]  José Antonio Lozano,et al.  Bayesian inference for algorithm ranking analysis , 2018, GECCO.

[22]  Hao Wang,et al.  IOHprofiler: A Benchmarking and Profiling Tool for Iterative Optimization Heuristics , 2018, ArXiv.

[23]  Marco Zaffalon,et al.  Imprecise Dirichlet Process With Application to the Hypothesis Test on the Probability That X ≤ Y , 2014 .

[24]  Marco Zaffalon,et al.  A Bayesian Wilcoxon signed-rank test based on the Dirichlet process , 2014, ICML.

[25]  Olivier Teytaud,et al.  Exploring the MLDA benchmark on the nevergrad platform , 2019, GECCO.

[26]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[27]  Szymon Wasik,et al.  Optil.io: Cloud Based Platform For Solving Optimization Problems Using Crowdsourcing Approach , 2016, CSCW '16 Companion.

[28]  Benjamin Doerr,et al.  Optimal Static and Self-Adjusting Parameter Choices for the (1+(λ,λ))\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$( , 2017, Algorithmica.