Reinforcement Learning with Limited Reinforcement : Using Bayes Risk for Active Learning in POMDPs Finale
暂无分享,去创建一个
Joelle Pineau | Nicholas Roy | Nicholas A. Roy | Finale Doshi-Velez | N. Roy | Joelle Pineau | Finale Doshi-Velez | Finale Doshi-Velez | Doshi-Velez
[1] Edward J. Sondik,et al. The optimal control of par-tially observable Markov processes , 1971 .
[2] E. J. Sondik,et al. The Optimal Control of Partially Observable Markov Decision Processes. , 1971 .
[3] C. Watkins. Learning from delayed rewards , 1989 .
[4] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.
[5] Lonnie Chrisman,et al. Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.
[6] Leslie Pack Kaelbling,et al. Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.
[7] Leslie Pack Kaelbling,et al. Acting under uncertainty: discrete Bayesian models for mobile-robot navigation , 1996, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. IROS '96.
[8] David Andre,et al. Model based Bayesian Exploration , 1999, UAI.
[9] Joelle Pineau,et al. Spoken Dialogue Management Using Probabilistic Reasoning , 2000, ACL.
[10] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.
[11] Milos Hauskrecht,et al. Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..
[12] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.
[13] S. L. Scott. Bayesian Methods for Hidden Markov Models , 2002 .
[14] P. Moral,et al. Sequential Monte Carlo samplers , 2002, cond-mat/0212648.
[15] Craig Boutilier,et al. A POMDP formulation of preference elicitation problems , 2002, AAAI/IAAI.
[16] Joelle Pineau,et al. Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.
[17] L. Ghaoui,et al. Robust markov decision processes with uncertain transition matrices , 2004 .
[18] Nikos A. Vlassis,et al. A point-based POMDP algorithm for robot planning , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.
[19] P. Pu,et al. IC / 2004 / 67 Survey of Preference Elicitation Methods , 2004 .
[20] Joelle Pineau,et al. Active Learning in Partially Observable Markov Decision Processes , 2005, ECML.
[21] Yossi Aviv,et al. A Partially Observed Markov Decision Process for Dynamic Pricing , 2005, Manag. Sci..
[22] Yishay Mansour,et al. Reinforcement Learning in POMDPs Without Resets , 2005, IJCAI.
[23] J.D. Williams,et al. Scaling up POMDPs for Dialog Management: The ``Summary POMDP'' Method , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..
[24] B. White. An economic analysis of ecological monitoring , 2005 .
[25] Jesse Hoey,et al. POMDP Models for Assistive Technology , 2005, AAAI Fall Symposium: Caring Machines.
[26] Guy Shani,et al. Model-Based Online Learning of POMDPs , 2005, ECML.
[27] Pascal Poupart,et al. Point-Based Value Iteration for Continuous POMDPs , 2006, J. Mach. Learn. Res..
[28] Jesse Hoey,et al. An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.
[29] Lihong Li,et al. Incremental Model-based Learners With Formal Learning-Time Guarantees , 2006, UAI.
[30] Joelle Pineau,et al. Bayes-Adaptive POMDPs , 2007, NIPS.
[31] Guy Shani,et al. Forward Search Value Iteration for POMDPs , 2007, IJCAI.
[32] Hideaki Itoh,et al. Partially observable Markov decision processes with imprecise parameters , 2007, Artif. Intell..
[33] Nicholas Roy,et al. Efficient model learning for dialog management , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).
[34] Steve J. Young,et al. Partially observable Markov decision processes for spoken dialog systems , 2007, Comput. Speech Lang..
[35] Christophe Andrieu,et al. A tutorial on adaptive MCMC , 2008, Stat. Comput..
[36] Pascal Poupart,et al. Model-based Bayesian Reinforcement Learning in Partially Observable Domains , 2008, ISAIM.
[37] David Hsu,et al. SARSOP: Efficient Point-Based POMDP Planning by Approximating Optimally Reachable Belief Spaces , 2008, Robotics: Science and Systems.
[38] Nicholas Roy,et al. Spoken language interaction with model uncertainty: an adaptive human–robot interaction system , 2008, Connect. Sci..
[39] Joelle Pineau,et al. Bayesian reinforcement learning in continuous POMDPs with application to robot navigation , 2008, 2008 IEEE International Conference on Robotics and Automation.
[40] Joelle Pineau,et al. Reinforcement learning with limited reinforcement: using Bayes risk for active learning in POMDPs , 2008, ICML '08.
[41] Andrew Y. Ng,et al. Near-Bayesian exploration in polynomial time , 2009, ICML '09.
[42] Brett Browning,et al. A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..
[43] Lihong Li,et al. A Bayesian Sampling Approach to Exploration in Reinforcement Learning , 2009, UAI.
[44] Joelle Pineau,et al. A bayesian reinforcement learning approach for customizing human-robot interfaces , 2009, IUI.
[45] Kee-Eung Kim,et al. Inverse Reinforcement Learning in Partially Observable Environments , 2009, IJCAI.
[46] Manuel Lopes,et al. Active Learning for Reward Estimation in Inverse Reinforcement Learning , 2009, ECML/PKDD.
[47] Masoumeh T. Izadi,et al. Sensitivity Analysis of POMDP Value Functions , 2009, 2009 International Conference on Machine Learning and Applications.
[48] Manuela M. Veloso,et al. Interactive Policy Learning through Confidence-Based Autonomy , 2014, J. Artif. Intell. Res..
[49] Richard L. Lewis,et al. Variance-Based Rewards for Approximate Bayesian Reinforcement Learning , 2010, UAI.
[50] Thomas J. Walsh,et al. Generalizing Apprenticeship Learning across Hypothesis Classes , 2010, ICML.
[51] Peter Stone,et al. Combining manual feedback with subsequent MDP reward signals for reinforcement learning , 2010, AAMAS.
[52] Joel Veness,et al. Monte-Carlo Planning in Large POMDPs , 2010, NIPS.