A Bayesian formulation of search, control and the exploration/exploitation trade-off
暂无分享,去创建一个
[1] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[2] David J. C. MacKay,et al. Information-Based Objective Functions for Active Data Selection , 1992, Neural Computation.
[3] F. A. Seiler,et al. Numerical Recipes in C: The Art of Scientific Computing , 1989 .
[4] Fred Glover,et al. Tabu Search - Part II , 1989, INFORMS J. Comput..
[5] J. Berger. Statistical Decision Theory and Bayesian Analysis , 1988 .
[6] Jean Walrand,et al. Extensions of the multiarmed bandit problem: The discounted case , 1985 .
[7] R. Keener. Further Contributions to the "Two-Armed Bandit" Problem , 1985 .
[8] C. D. Gelatt,et al. Optimization by Simulated Annealing , 1983, Science.
[9] P. Kumar,et al. On the optimal solution of the one-armed bandit adaptive control problem , 1981 .
[10] Philip E. Gill,et al. Practical optimization , 1981 .
[11] Gerald S. Rogers,et al. Mathematical Statistics: A Decision Theoretic Approach , 1967 .
[12] R. Howard. Dynamic Programming and Markov Processes , 1960 .
[13] D. Lindley. On a Measure of the Information Provided by an Experiment , 1956 .
[14] E. Rowland. Theory of Games and Economic Behavior , 1946, Nature.
[15] Fred W. Glover,et al. Tabu Search - Part I , 1989, INFORMS J. Comput..
[16] P. Whittle. Multi‐Armed Bandits and the Gittins Index , 1980 .
[17] John H. Holland,et al. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .
[18] L. M. M.-T.. Theory of Probability , 1929, Nature.