What is important about the No Free Lunch theorems?

The No Free Lunch theorems prove that under a uniform distribution over induction problems (search problems or learning problems), all induction algorithms perform equally. As I discuss in this chapter, the importance of the theorems arises by using them to analyze scenarios involving {non-uniform} distributions, and to compare different algorithms, without any assumption about the distribution over problems at all. In particular, the theorems prove that {anti}-cross-validation (choosing among a set of candidate algorithms based on which has {worst} out-of-sample behavior) performs as well as cross-validation, unless one makes an assumption -- which has never been formalized -- about how the distribution over induction problems, on the one hand, is related to the set of algorithms one is choosing among using (anti-)cross validation, on the other. In addition, they establish strong caveats concerning the significance of the many results in the literature which establish the strength of a particular algorithm without assuming a particular distribution. They also motivate a ``dictionary'' between supervised learning and improve blackbox optimization, which allows one to ``translate'' techniques from supervised learning into the domain of blackbox optimization, thereby strengthening blackbox optimization algorithms. In addition to these topics, I also briefly discuss their implications for philosophy of science.

[1]  Leto Peel,et al.  The ground truth about metadata and community detection in networks , 2016, Science Advances.

[2]  Yuri Ermoliev,et al.  Monte Carlo Optimization and Path Dependent Nonstationary Laws of Large Numbers , 1998 .

[3]  David H. Wolpert On The Bayesian Occam Factors Argument For Occam's Razor , 1992 .

[4]  H WolpertDavid,et al.  What makes an optimization problem hard , 1995 .

[5]  James O. Berger,et al.  Ockham's Razor and Bayesian Analysis , 1992 .

[6]  P. Godfrey‐Smith Theory and reality : an introduction to the philosophy of science , 2003 .

[7]  David H. Wolpert,et al.  The Existence of A Priori Distinctions Between Learning Algorithms , 1996, Neural Computation.

[8]  Dirk P. Kroese,et al.  Cross‐Entropy Method , 2011 .

[9]  L. Darrell Whitley,et al.  A "No Free Lunch" Tutorial: Sharpened and Focused No Free Lunch , 2011, Theory of Randomized Search Heuristics.

[10]  Tor Lattimore,et al.  No Free Lunch versus Occam's Razor in Supervised Learning , 2011, Algorithmic Probability and Friends.

[11]  David H. Wolpert,et al.  The Relationship Between Occam's Razor and Convergent Guessing , 1990, Complex Syst..

[12]  Marc Toussaint,et al.  A No-Free-Lunch theorem for non-uniform distributions of target functions , 2004, J. Math. Model. Algorithms.

[13]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[14]  David H. Wolpert,et al.  Bias-Variance Techniques for Monte Carlo Optimization: Cross-validation for the CE Method , 2008, ArXiv.

[15]  David H. Wolpert,et al.  On Bias Plus Variance , 1997, Neural Computation.

[16]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[17]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[18]  David H. Wolpert,et al.  The Relationship Between PAC, the Statistical Physics Framework, the Bayesian Framework, and the VC Framework , 1995 .

[19]  Cullen Schaffer,et al.  A Conservation Law for Generalization Performance , 1994, ICML.

[20]  David H. Wolpert,et al.  Coevolutionary free lunches , 2005, IEEE Transactions on Evolutionary Computation.

[21]  S. Gull Bayesian Inductive Inference and Maximum Entropy , 1988 .

[22]  Paul A. Viola,et al.  MIMIC: Finding Optima by Estimating Probability Densities , 1996, NIPS.

[23]  Tobias J. Osborne,et al.  No Free Lunch for Quantum Machine Learning , 2020, 2003.14103.