Statistical Comparison of Algorithm Performance Through Instance Selection

Empirical performance evaluations, in competitions and scientific publications, play a major role in improving the state of the art in solving many automated reasoning problems, including SAT, CSP and Bayesian network structure learning (BNSL). To empirically demonstrate the merit of a new solver usually requires extensive experiments, with computational costs of CPU years. This not only makes it difficult for researchers with limited access to computational resources to test their ideas and publish their work, but also consumes large amounts of energy. We propose an approach for comparing the performance of two algorithms: by performing runs on carefully chosen instances, we obtain a probabilistic statement on which algorithm performs best, trading off between the computational cost of running algorithms and the confidence in the result. We describe a set of methods for this purpose and evaluate their efficacy on diverse datasets from SAT, CSP and BNSL. On all these datasets, most of our approaches were able to choose the correct algorithm with about 95% accuracy, while using less than a third of the CPU time required for a full comparison; the best methods reach this level of accuracy within less than 15% of the CPU time for a full comparison. 2012 ACM Subject Classification General and reference → Evaluation; Theory of computation → Automated reasoning; Theory of computation → Constraint and logic programming

[1]  David J. Groggel,et al.  Practical Nonparametric Statistics , 2000, Technometrics.

[2]  Kevin Leyton-Brown,et al.  Algorithm runtime prediction: Methods & evaluation , 2012, Artif. Intell..

[3]  Luca Pulina,et al.  The 2016 and 2017 QBF solvers evaluations (QBFEVAL'16 and QBFEVAL'17) , 2019, Artif. Intell..

[4]  Andrew W. Moore,et al.  The Racing Algorithm: Model Selection for Lazy Learners , 1997, Artificial Intelligence Review.

[5]  Heike Trautmann,et al.  Automated Algorithm Selection: Survey and Perspectives , 2018, Evolutionary Computation.

[6]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[7]  Barry O'Sullivan,et al.  Statistical Regimes and Runtime Prediction , 2015, IJCAI.

[8]  Charles L. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[9]  J. Pratt Remarks on Zeros and Ties in the Wilcoxon Signed Rank Procedures , 1959 .

[10]  Peter J. Stuckey,et al.  The MiniZinc Challenge 2008-2013 , 2014, AI Mag..

[11]  Leslie Pérez Cáceres,et al.  The irace package: Iterated racing for automatic algorithm configuration , 2016 .

[12]  Bernd Bischl,et al.  ASlib: A benchmark library for algorithm selection , 2015, Artif. Intell..

[13]  G. J. Hahn,et al.  A Simple Method for Regression Analysis With Censored Data , 1979 .

[14]  Bilal Syed Hussain,et al.  Discriminating Instance Generation for Automated Constraint Model Selection , 2014, CP.

[15]  Brandon M. Malone,et al.  Empirical hardness of finding optimal Bayesian network structures: algorithm selection and runtime prediction , 2017, Machine Learning.

[16]  Holger H. Hoos,et al.  Automated Algorithm Configuration and Parameter Tuning , 2012, Autonomous Search.

[17]  Xizhao Wang,et al.  A survey on active learning strategy , 2010, 2010 International Conference on Machine Learning and Cybernetics.

[18]  Jakob Bossek,et al.  Generating instances with performance differences for more than just two algorithms , 2021, GECCO Companion.

[19]  Bart Selman,et al.  Heavy-Tailed Phenomena in Satisfiability and Constraint Satisfaction Problems , 2000, Journal of Automated Reasoning.

[20]  Thomas Stützle,et al.  A Racing Algorithm for Configuring Metaheuristics , 2002, GECCO.

[21]  Marijn J. H. Heule,et al.  SAT Competition 2018 , 2019, J. Satisf. Boolean Model. Comput..

[22]  Mauro Birattari,et al.  Tuning Metaheuristics - A Machine Learning Perspective , 2009, Studies in Computational Intelligence.

[23]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[24]  Hilary Putnam,et al.  A Computing Procedure for Quantification Theory , 1960, JACM.

[25]  Lars Kotthoff,et al.  Open Algorithm Selection Challenge 2017: Setup and Scenarios , 2017, OASC.

[26]  Kevin Leyton-Brown,et al.  Bayesian Optimization With Censored Response Data , 2013, ArXiv.

[27]  Kevin Leyton-Brown,et al.  SATzilla: Portfolio-based Algorithm Selection for SAT , 2008, J. Artif. Intell. Res..

[28]  Marijn J. H. Heule,et al.  Proceedings of SAT Competition 2017: Solver and Benchmark Descriptions , 2017 .