论文信息 - Multi-Armed Bandit Bayesian Decision Making - 字舞流文

Multi-Armed Bandit Bayesian Decision Making

Stephen Roberts | R. E. McInerney | S. Roberts | R. McInerney

[1] W. James. The Principles of Psychology, Vol. I , 2008 .

[2] John McCarthy,et al. A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence, August 31, 1955 , 2006, AI Mag..

[3] W. D. Penny,et al. Real-time brain-computer interfacing: A preliminary study using Bayesian learning , 2006, Medical and Biological Engineering and Computing.

[4] Mehryar Mohri,et al. Multi-armed Bandit Algorithms and Empirical Evaluation , 2005, ECML.

[5] L. Merabet,et al. The plastic human brain cortex. , 2005, Annual review of neuroscience.

[6] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[7] Martin J. Osborne,et al. An Introduction to Game Theory , 2003 .

[8] M. Tribus,et al. Probability theory: the logic of science , 2003 .

[9] Shie Mannor,et al. PAC Bounds for Multi-armed Bandit and Markov Decision Processes , 2002, COLT.

[10] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[11] Nicolò Cesa-Bianchi,et al. Finite-Time Regret Bounds for the Multiarmed Bandit Problem , 1998, ICML.

[12] David H. Wolpert,et al. Bandit problems and the exploration/exploitation tradeoff , 1998, IEEE Trans. Evol. Comput..

[13] Nicolò Cesa-Bianchi,et al. Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[14] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..

[15] R. McKelvey,et al. Quantal Response Equilibria for Normal Form Games , 1995 .

[16] John R. Kirby,et al. Intelligence and Social Policy. , 1995 .

[17] L. Kaelbling. Learning in embedded systems , 1993 .

[18] Manfred K. Warmuth,et al. The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.

[19] Jean Walrand,et al. Extensions of the multiarmed bandit problem: The discounted case , 1985 .

[20] Leslie G. Valiant,et al. A theory of the learnable , 1984, STOC '84.

[21] J. Gittins,et al. A dynamic allocation index for the discounted multiarmed bandit problem , 1979 .

[22] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .

[23] R. Duncan Luce,et al. Individual Choice Behavior , 1959 .

[24] H. Robbins. Some aspects of the sequential design of experiments , 1952 .

[25] J. Neumann,et al. Theory of games and economic behavior , 1945, 100 Years of Math Milestones.

[26] W. R. Thompson. On the Theory of Apportionment , 1935 .

[27] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[28] W. James. The principles of psychology , 1983 .