A Bayesian formulation of search, control and the exploration/exploitation trade-off

A new approach to optimisation is introduced based on a precise probabilistic statement of what is ideally required of an optimisation method. It is convenient to express the formalism in terms of the control of a stationary environment. This leads to an objective function for the controller which unifies the objectives of exploration and exploitation, thereby providing a quantitative principle for managing this trade-off. This is demonstrated using a variant of the multi-armed bandit problem. This approach opens new possibilities for optimisation algorithms, particularly by using neural network or other adaptive methods for the adaptive controller. It also opens possibilities for deepening understanding of existing methods. The realisation of these possibilities requires research into practical approximations of the exact formalism.

[1]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[2]  David J. C. MacKay,et al.  Information-Based Objective Functions for Active Data Selection , 1992, Neural Computation.

[3]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[4]  Fred Glover,et al.  Tabu Search - Part II , 1989, INFORMS J. Comput..

[5]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[6]  Jean Walrand,et al.  Extensions of the multiarmed bandit problem: The discounted case , 1985 .

[7]  R. Keener Further Contributions to the "Two-Armed Bandit" Problem , 1985 .

[8]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[9]  P. Kumar,et al.  On the optimal solution of the one-armed bandit adaptive control problem , 1981 .

[10]  Philip E. Gill,et al.  Practical optimization , 1981 .

[11]  Gerald S. Rogers,et al.  Mathematical Statistics: A Decision Theoretic Approach , 1967 .

[12]  R. Howard Dynamic Programming and Markov Processes , 1960 .

[13]  D. Lindley On a Measure of the Information Provided by an Experiment , 1956 .

[14]  E. Rowland Theory of Games and Economic Behavior , 1946, Nature.

[15]  Fred W. Glover,et al.  Tabu Search - Part I , 1989, INFORMS J. Comput..

[16]  P. Whittle Multi‐Armed Bandits and the Gittins Index , 1980 .

[17]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[18]  L. M. M.-T. Theory of Probability , 1929, Nature.