Model-based Bayesian Reinforcement Learning in Partially Observable Domains
暂无分享,去创建一个
[1] Michael I. Jordan. Learning in Graphical Models , 1999, NATO ASI Series.
[2] Pascal Poupart,et al. Point-Based Value Iteration for Continuous POMDPs , 2006, J. Mach. Learn. Res..
[3] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..
[4] David Heckerman,et al. A Tutorial on Learning with Bayesian Networks , 1999, Innovations in Bayesian Networks.
[5] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.
[6] S. Shankar Sastry,et al. Autonomous Helicopter Flight via Reinforcement Learning , 2003, NIPS.
[7] Andrew Y. Ng,et al. Policy Search via Density Estimation , 1999, NIPS.
[8] Jesse Hoey,et al. An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.
[9] Michael I. Jordan,et al. PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.
[10] Doina Precup,et al. Using Linear Programming for Bayesian Exploration in Markov Decision Processes , 2007, IJCAI.
[11] Kee-Eung Kim,et al. Learning Finite-State Controllers for Partially Observable Environments , 1999, UAI.
[12] Joelle Pineau,et al. Active Learning in Partially Observable Markov Decision Processes , 2005, ECML.
[13] Andrew G. Barto,et al. Optimal learning: computational procedures for bayes-adaptive markov decision processes , 2002 .
[14] David Andre,et al. Model based Bayesian Exploration , 1999, UAI.
[15] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..
[16] Douglas Aberdeen,et al. Scalable Internal-State Policy-Gradient Methods for POMDPs , 2002, ICML.
[17] Tao Wang,et al. Bayesian sparse sampling for on-line reward optimization , 2005, ICML.
[18] Peter Stone,et al. Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.
[19] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.
[20] Jonathan Baxter,et al. Scaling Internal-State Policy-Gradient Methods for POMDPs , 2002 .