论文信息 - Sample size calculations for the experimental comparison of multiple algorithms on multiple problem instances

Sample size calculations for the experimental comparison of multiple algorithms on multiple problem instances

This work presents a statistically principled method for estimating the required number of instances in the experimental comparison of multiple algorithms on a given problem class of interest. This approach generalises earlier results by allowing researchers to design experiments based on the desired best, worst, mean or median-case statistical power to detect differences between algorithms larger than a certain threshold. Holm's step-down procedure is used to maintain the overall significance level controlled at desired levels, without resulting in overly conservative experiments. This paper also presents an approach for sampling each algorithm on each instance, based on optimal sample size ratios that minimise the total required number of runs subject to a desired accuracy in the estimation of paired differences. A case study investigating the effect of 21 variants of a custom-tailored Simulated Annealing for a class of scheduling problems is used to illustrate the application of the proposed methods for sample size calculations in the experimental comparison of algorithms.

Elizabeth F. Wanner | Felipe Campelo | F. Campelo | E. Wanner

[1] Mauro Birattari,et al. How to assess and report the performance of a stochastic algorithm on a benchmark problem: mean or best result on a number of runs? , 2007, Optim. Lett..

[2] David J. Sheskin,et al. Handbook of Parametric and Nonparametric Statistical Procedures , 1997 .

[3] Fernanda C. Takahashi,et al. Sample size estimation for power and accuracy in the experimental comparison of algorithms , 2019, J. Heuristics.

[4] Greet Vanden Berghe,et al. Analysis of stochastic local search methods for the unrelated parallel machine scheduling problem , 2019, Int. Trans. Oper. Res..

[5] HerreraFrancisco,et al. Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining , 2010 .

[6] Elizabeth F. Wanner,et al. A Multicriteria Statistical Based Comparison Methodology for Evaluating Evolutionary Algorithms , 2011, IEEE Transactions on Evolutionary Computation.

[7] David S. Johnson,et al. A theoretician's guide to the experimental analysis of algorithms , 1999, Data Structures, Near Neighbor Searches, and Methodology.

[8] Andrew Gelman,et al. Data Analysis Using Regression and Multilevel/Hierarchical Models , 2006 .

[9] Eugene L. Lawler,et al. Sequencing and scheduling: algorithms and complexity , 1989 .

[10] J. Revuelta,et al. Optimization of sample size in controlled experiments: The CLAST rule , 2006, Behavior research methods.

[11] Francisco Herrera,et al. A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms , 2011, Swarm Evol. Comput..

[12] Paul D. Ellis,et al. The Essential Guide to Effect Sizes: Contents , 2010 .

[13] Volker H. Franz,et al. Ratios: A short guide to confidence limits and proper use , 2007, 0710.2024.

[14] Marcus Gallagher,et al. An improved small-sample statistical test for comparing the success rates of evolutionary algorithms , 2009, GECCO '09.

[15] Russell V. Lenth,et al. Some Practical Guidelines for Effective Sample Size Determination , 2001 .

[16] John N. Hooker,et al. Needed: An Empirical Science of Algorithms , 1994, Oper. Res..

[17] Stanley E Lazic,et al. The problem of pseudoreplication in neuroscientific studies: is it affecting your analysis? , 2010, BMC Neuroscience.

[18] Ofer M. Shir,et al. Bayesian performance analysis for black-box optimization benchmarking , 2019, GECCO.

[19] Russell B. Millar,et al. Remedies for pseudoreplication , 2004 .

[20] Robert V. Brill,et al. Applied Statistics and Probability for Engineers , 2004, Technometrics.

[21] Mauricio G. C. Resende,et al. Designing and reporting on computational experiments with heuristic methods , 1995, J. Heuristics.

[22] Jay Bartroff,et al. Sequential Experimentation in Clinical Trials , 2013 .

[23] Thomas Bartz-Beielstein. How to Create Generalizable Results , 2015, Handbook of Computational Intelligence.

[24] Matthew J. Saltzman,et al. Statistical Analysis of Computational Tests of Algorithms and Heuristics , 2000, INFORMS J. Comput..

[25] Rodolfo Lourenzutti,et al. Ranking and comparing evolutionary algorithms with Hellinger-TOPSIS , 2015, Appl. Soft Comput..

[26] Marco Zaffalon,et al. Time for a change: a tutorial for comparing multiple classifiers through Bayesian analysis , 2016, J. Mach. Learn. Res..

[27] Catherine C. McGeoch. Feature Article - Toward an Experimental Method for Algorithm Simulation , 1996, INFORMS J. Comput..

[28] Marco Zaffalon,et al. A Bayesian Wilcoxon signed-rank test based on the Dirichlet process , 2014, ICML.

[29] Jay Bartroff,et al. Sequential Experimentation in Clinical Trials: Design and Analysis , 2012 .

[30] Janez Demsar,et al. Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[31] Paul Mathews,et al. Sample Size Calculations: Practical Methods for Engineers and Scientists , 2010 .

[32] Francisco Herrera,et al. A study of statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability , 2009, Soft Comput..

[33] J. Kruschke. Doing Bayesian Data Analysis: A Tutorial with R and BUGS , 2010 .

[34] Thomas Bartz-Beielstein,et al. Experimental Methods for the Analysis of Optimization Algorithms , 2010 .

[35] Thomas Bartz-Beielstein,et al. Experimental research in evolutionary computation , 2007, GECCO '07.

[36] A. E. Eiben,et al. A critical note on experimental research methodology in EC , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[37] Thomas Bartz-Beielstein,et al. New experimentalism applied to evolutionary computation , 2005 .

[38] Rubén Ruiz,et al. A genetic algorithm for the unrelated parallel machine scheduling problem with sequence dependent setup times , 2011, Eur. J. Oper. Res..

[39] E.L. Lawler,et al. Optimization and Approximation in Deterministic Sequencing and Scheduling: a Survey , 1977 .

[40] E. C. Fieller. SOME PROBLEMS IN INTERVAL ESTIMATION , 1954 .

[41] Enda Ridge,et al. Design of Experiments for the Tuning of Optimisation Algorithms , 2007 .

[42] Marcus Gallagher,et al. Statistical Racing Techniques for Improved Empirical Evaluation of Evolutionary Algorithms , 2004, PPSN.

[43] S. Hurlbert. Pseudoreplication and the Design of Ecological Field Experiments , 1984 .

[44] Felipe Campelo. CAISEr: Comparison of Algorithms with Iterative Sample Size Estimation , 2017 .

[45] Anne Auger,et al. COCO: The Experimental Procedure , 2016, ArXiv.

[46] M. Birattari,et al. Artificielle On the Estimation of the Expected Performance of a Metaheuristic on a Class of Instances How many instances , how many runs ? , 2004 .

[47] Douglas C. Montgomery,et al. Applied Statistics and Probability for Engineers, Third edition , 1994 .

[48] S. Holm. A Simple Sequentially Rejective Multiple Test Procedure , 1979 .