An analytic solution to discrete Bayesian reinforcement learning
暂无分享,去创建一个
Jesse Hoey | Pascal Poupart | Nikos A. Vlassis | Kevin Regan | N. Vlassis | P. Poupart | J. Hoey | K. Regan
[1] M. Degroot. Optimal Statistical Decisions , 1970 .
[2] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..
[3] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .
[4] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..
[5] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.
[6] Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.
[7] Andrew G. Barto,et al. Reinforcement learning , 1998 .
[8] Stuart J. Russell,et al. Bayesian Q-Learning , 1998, AAAI/IAAI.
[9] David Andre,et al. Model based Bayesian Exploration , 1999, UAI.
[10] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.
[11] Andrew G. Barto,et al. Optimal learning: computational procedures for bayes-adaptive markov decision processes , 2002 .
[12] S. Shankar Sastry,et al. Autonomous Helicopter Flight via Reinforcement Learning , 2003, NIPS.
[13] Michael O. Duff,et al. Design for an Optimal Probe , 2003, ICML.
[14] Ben Tse,et al. Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.
[15] Paul Bourgine,et al. Exploration of Multi-State Environments: Local Measures and Back-Propagation of Uncertainty , 1999, Machine Learning.
[16] Nikos A. Vlassis,et al. Perseus: Randomized Point-based Value Iteration for POMDPs , 2005, J. Artif. Intell. Res..
[17] Nikos A. Vlassis,et al. Robot Planning in Partially Observable Continuous Domains , 2005, BNAIC.
[18] Jesse Hoey,et al. A Decision-Theoretic Approach to Task Assistance for Persons with Dementia , 2005, IJCAI.
[19] Tao Wang,et al. Bayesian sparse sampling for on-line reward optimization , 2005, ICML.