The famine of forte: Few search problems greatly favor your algorithm

Casting machine learning as a type of search, we demonstrate that the proportion of problems that are favorable for a fixed algorithm is strictly bounded, such that no single algorithm can perform well over a large fraction of them. Our results explain why we must either continue to develop new learning methods year after year or move towards highly parameterized models that are both flexible and sensitive to their hyperparameters. We further give an upper bound on the expected performance for a search algorithm as a function of the mutual information between the target and the information resource (e.g., training dataset), proving the importance of certain types of dependence for machine learning. Lastly, we show that the expected per-query probability of success for an algorithm is mathematically equivalent to a single-query probability of success under a distribution (called a search strategy), and prove that the proportion of favorable strategies is also strictly bounded. Thus, whether one holds fixed the search algorithm and considers all possible problems or one fixes the search problem and looks at all possible search strategies, favorable matches are exceedingly rare. The forte (strength) of any algorithm is quantifiably restricted.

[1]  Thomas M. English No more lunch: analysis of sequential search , 2004, Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No.04TH8753).

[2]  Doudou LaLoudouana Data Set Selection , 2002 .

[3]  Tom M. Mitchell,et al.  Generalization as Search , 2002 .

[4]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[5]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[6]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[7]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[8]  Sergio Escalera,et al.  Design of the 2015 ChaLearn AutoML challenge , 2015, IJCNN.

[9]  Joseph C. Culberson,et al.  On the Futility of Blind Search: An Algorithmic View of No Free Lunch , 1998, Evolutionary Computation.

[10]  Nando de Freitas,et al.  Bayesian Optimization in a Billion Dimensions via Random Embeddings , 2013, J. Artif. Intell. Res..

[11]  W. Spears,et al.  For Every Generalization Action, Is There Really an Equal and Opposite Reaction? , 1995, ICML.

[12]  David D. Cox,et al.  Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures , 2013, ICML.

[13]  L. Darrell Whitley,et al.  Functions as Permutations: Regarding No Free Lunch, Walsh Analysis and Summary Statistics , 2000, PPSN.

[14]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[15]  Cullen Schaffer,et al.  A Conservation Law for Generalization Performance , 1994, ICML.

[16]  R. Marks,et al.  A General Theory of Information Cost Incurred by Successful Search , 2013 .

[17]  Robert J. Marks,et al.  The Search for a Search: Measuring the Information Cost of Higher Level Search , 2010, J. Adv. Comput. Intell. Intell. Informatics.

[18]  T. M. English Optimization is easy and learning is hard in the typical function , 2000, Proceedings of the 2000 Congress on Evolutionary Computation. CEC00 (Cat. No.00TH8512).

[19]  Robert J. Marks,et al.  Conservation of Information in Search: Measuring the Cost of Success , 2009, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[20]  Thomas Jansen,et al.  Optimization with randomized search heuristics - the (A)NFL theorem, realistic scenarios, and difficult functions , 2002, Theor. Comput. Sci..

[21]  Thomas M. English,et al.  Evaluation of Evolutionary and Genetic Optimizers: No Free Lunch , 1996, Evolutionary Programming.

[22]  George D. Montafiez The famine of forte: Few search problems greatly favor your algorithm , 2017, 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[23]  Kevin Leyton-Brown,et al.  Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms , 2012, KDD.

[24]  Tom M. Mitchell,et al.  The Need for Biases in Learning Generalizations , 2007 .

[25]  L. D. Whitley,et al.  The No Free Lunch and problem description length , 2001 .

[27]  George D. Montanez Bounding the number of favorable functions in stochastic search , 2013, 2013 IEEE Congress on Evolutionary Computation.