Multi-Armed Bandit Bayesian Decision Making

[1]  W. James The Principles of Psychology, Vol. I , 2008 .

[2]  John McCarthy,et al.  A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence, August 31, 1955 , 2006, AI Mag..

[3]  W. D. Penny,et al.  Real-time brain-computer interfacing: A preliminary study using Bayesian learning , 2006, Medical and Biological Engineering and Computing.

[4]  Mehryar Mohri,et al.  Multi-armed Bandit Algorithms and Empirical Evaluation , 2005, ECML.

[5]  L. Merabet,et al.  The plastic human brain cortex. , 2005, Annual review of neuroscience.

[6]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[7]  Martin J. Osborne,et al.  An Introduction to Game Theory , 2003 .

[8]  M. Tribus,et al.  Probability theory: the logic of science , 2003 .

[9]  Shie Mannor,et al.  PAC Bounds for Multi-armed Bandit and Markov Decision Processes , 2002, COLT.

[10]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[11]  Nicolò Cesa-Bianchi,et al.  Finite-Time Regret Bounds for the Multiarmed Bandit Problem , 1998, ICML.

[12]  David H. Wolpert,et al.  Bandit problems and the exploration/exploitation tradeoff , 1998, IEEE Trans. Evol. Comput..

[13]  Nicolò Cesa-Bianchi,et al.  Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[14]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[15]  R. McKelvey,et al.  Quantal Response Equilibria for Normal Form Games , 1995 .

[16]  John R. Kirby,et al.  Intelligence and Social Policy. , 1995 .

[17]  L. Kaelbling Learning in embedded systems , 1993 .

[18]  Manfred K. Warmuth,et al.  The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.

[19]  Jean Walrand,et al.  Extensions of the multiarmed bandit problem: The discounted case , 1985 .

[20]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[21]  J. Gittins,et al.  A dynamic allocation index for the discounted multiarmed bandit problem , 1979 .

[22]  J. Gittins Bandit processes and dynamic allocation indices , 1979 .

[23]  R. Duncan Luce,et al.  Individual Choice Behavior , 1959 .

[24]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[25]  J. Neumann,et al.  Theory of games and economic behavior , 1945, 100 Years of Math Milestones.

[26]  W. R. Thompson On the Theory of Apportionment , 1935 .

[27]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[28]  W. James The principles of psychology , 1983 .