Achieving COSMOS: a metric for determining when to give up and when to reach for the stars

The utility of current metrics used in genetic programming (GP) systems, such as computational effort and mean-best-fitness, varies depending upon the problem and the resource that needs to be optimized. Inferences about the underlying system can only be made when a sufficient number of runs are performed to estimate the relevant metric within some confidence interval. This paper proposes a new algorithm for determining the minimum number of independent runs needed to make inferences about a GP system. As such, we view our algorithm as a meta-metric that should be satisfied before any inferences about a system are made. We call this metric COSMOS, as it estimates the number of independent runs needed to achieve the Convergence Of Sample Means Of the Order Statistics. It is agnostic to the underlying GP system and can be used to evaluate extant performance metrics, as well as problem difficulty. We suggest ways for which COSMOS may be used to identify problems for which GP may be uniquely qualified to solve.

[1]  Jason M. Daida,et al.  Challenges with Verification, Repeatability, and Meaningful Comparison in Genetic Programming: Gibson's Magic , 1999, GECCO.

[2]  S. Luke When short runs beat long runs , 2001 .

[3]  David Hinkley,et al.  Bootstrap Methods: Another Look at the Jackknife , 2008 .

[4]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[5]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[6]  Christopher H. Messom,et al.  Success effort and other statistics for performance comparisons in genetic programming , 2007, 2007 IEEE Congress on Evolutionary Computation.

[7]  Christopher H. Messom,et al.  Confidence Intervals for Computational Effort Comparisons , 2007, EuroGP.

[8]  Sean Luke,et al.  Is The Perfect The Enemy Of The Good? , 2002, GECCO.

[9]  Dean F. Hougen,et al.  Is "best-so-far" a good algorithmic performance metric? , 2008, GECCO '08.

[10]  Jason M. Daida,et al.  What Makes a Problem GP-Hard? Analysis of a Tunably Difficult Problem in Genetic Programming , 1999, Genetic Programming and Evolvable Machines.

[11]  Conor Ryan,et al.  On Improving Generalisation in Genetic Programming , 2009, EuroGP.

[12]  Robert L. Wolpert,et al.  Statistical Inference , 2019, Encyclopedia of Social Network Analysis and Mining.

[13]  P. Angeline An Investigation into the Sensitivity of Genetic Programming to the Frequency of Leaf Selection Duri , 1996 .

[14]  B. Efron Bootstrap Methods: Another Look at the Jackknife , 1979 .

[15]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[16]  Christopher H. Messom,et al.  The reliability of confidence intervals for computational effort comparisons , 2007, GECCO '07.

[17]  Wolfgang Banzhaf,et al.  More on Computational Effort Statistics for Genetic Programming , 2003, EuroGP.

[18]  María Dolores Rodríguez-Moreno,et al.  Statistical Distribution of Generation-to-Success in GP: Application to Model Accumulated Success Probability , 2011, EuroGP.