A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes
暂无分享,去创建一个
Joelle Pineau | Stéphane Ross | Brahim Chaib-draa | Pierre Kreitmann | Joelle Pineau | Stéphane Ross | B. Chaib-draa | Pierre Kreitmann
[1] Richard Bellman,et al. Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.
[2] Doina Precup,et al. Using Linear Programming for Bayesian Exploration in Markov Decision Processes , 2007, IJCAI.
[3] Peter Auer,et al. Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning , 2006, NIPS.
[4] Pascal Poupart,et al. Model-based Bayesian Reinforcement Learning in Partially Observable Domains , 2008, ISAIM.
[5] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[6] N. Filatov,et al. Survey of adaptive dual control methods , 2000 .
[7] Andrew G. Barto,et al. Optimal learning: computational procedures for bayes-adaptive markov decision processes , 2002 .
[8] Jesse Hoey,et al. An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.
[9] Joelle Pineau,et al. Online Planning Algorithms for POMDPs , 2008, J. Artif. Intell. Res..
[10] Michael O. Duff,et al. Monte-Carlo Algorithms for the Improvement of Finite-State Stochastic Controllers: Application to Bayes-Adaptive Markov Decision Processes , 2001, AISTATS.
[11] Edward J. Sondik,et al. The optimal control of par-tially observable Markov processes , 1971 .
[12] Joelle Pineau,et al. Reinforcement learning with limited reinforcement: using Bayes risk for active learning in POMDPs , 2008, ICML '08.
[13] Ilan Rusnak Rafael. Optimal Adaptive Control of Uncertain Stochastic Discrete Linear Systems , 1999 .
[14] Joel Veness,et al. Monte-Carlo Planning in Large POMDPs , 2010, NIPS.
[15] Finale Doshi-Velez,et al. The Infinite Partially Observable Markov Decision Process , 2009, NIPS.
[16] Andrew Y. Ng,et al. Near-Bayesian exploration in polynomial time , 2009, ICML '09.
[17] L. M. M.-T.. Theory of Probability , 1929, Nature.
[18] Brahim Chaib-draa,et al. An online POMDP algorithm for complex multiagent environments , 2005, AAMAS '05.
[19] Dr. Marcus Hutter,et al. Universal artificial intelligence , 2004 .
[20] Andrew McCallum,et al. Reinforcement learning with selective perception and hidden state , 1996 .
[21] Ambuj Tewari,et al. Optimistic Linear Programming gives Logarithmic Regret for Irreducible MDPs , 2007, NIPS.
[22] Tao Wang,et al. Bayesian sparse sampling for on-line reward optimization , 2005, ICML.
[23] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[24] Shie Mannor,et al. Bayes Meets Bellman: The Gaussian Process Approach to Temporal Difference Learning , 2003, ICML.
[25] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[26] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..
[27] O. Zane. Discrete-time Bayesian adaptive control problems with complete observations , 1992, [1992] Proceedings of the 31st IEEE Conference on Decision and Control.
[28] Joelle Pineau,et al. Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.
[29] G. Casella,et al. Statistical Inference , 2003, Encyclopedia of Social Network Analysis and Mining.
[30] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.
[31] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[32] Joel Veness,et al. A Monte-Carlo AIXI Approximation , 2009, J. Artif. Intell. Res..
[33] Reid G. Simmons,et al. Heuristic Search Value Iteration for POMDPs , 2004, UAI.
[34] Arnaud Doucet,et al. Sequential Monte Carlo Methods , 2006, Handbook of Graphical Models.
[35] Edwin T. Jaynes. Prior Probabilities , 2010, Encyclopedia of Machine Learning.
[36] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..
[37] Sean P. Meyn,et al. Bayesian adaptive control of time varying systems , 1992, [1992] Proceedings of the 31st IEEE Conference on Decision and Control.
[38] Michael L. Littman,et al. A theoretical analysis of Model-Based Interval Estimation , 2005, ICML.
[39] Shie Mannor,et al. Percentile optimization in uncertain Markov decision processes with application to efficient exploration , 2007, ICML '07.
[40] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[41] Shie Mannor,et al. Reinforcement learning with Gaussian processes , 2005, ICML.
[42] Richard E. Neapolitan,et al. Learning Bayesian networks , 2007, KDD '07.
[43] Joelle Pineau,et al. Bayes-Adaptive POMDPs , 2007, NIPS.
[44] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[45] Lihong Li,et al. A Bayesian Sampling Approach to Exploration in Reinforcement Learning , 2009, UAI.
[46] A. Greenfield,et al. Adaptive Control of Nonlinear Stochastic Systems by Particle Filtering , 2003, 2003 4th International Conference on Control and Automation Proceedings.
[47] Brahim Chaib-draa,et al. Bayesian reinforcement learning in continuous POMDPs with gaussian processes , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[48] A. Guez,et al. Optimal adaptive control of uncertain stochastic linear systems , 1995, 1995 IEEE International Conference on Systems, Man and Cybernetics. Intelligent Systems for the 21st Century.
[49] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.
[50] Timothy J. Robinson,et al. Sequential Monte Carlo Methods in Practice , 2003 .
[51] Nikos A. Vlassis,et al. Perseus: Randomized Point-based Value Iteration for POMDPs , 2005, J. Artif. Intell. Res..
[52] Richard S. Sutton,et al. Predictive Representations of State , 2001, NIPS.
[53] David Maxwell Chickering,et al. Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.
[54] Joelle Pineau,et al. Active Learning in Partially Observable Markov Decision Processes , 2005, ECML.
[55] David Andre,et al. Model based Bayesian Exploration , 1999, UAI.
[56] Stuart J. Russell,et al. Bayesian Q-Learning , 1998, AAAI/IAAI.
[57] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[58] Mohammad Ghavamzadeh,et al. Bayesian actor-critic algorithms , 2007, ICML '07.
[59] Mohammad Ghavamzadeh,et al. Bayesian Policy Gradient Algorithms , 2006, NIPS.