论文信息 - Reinforcement Learning with Limited Reinforcement : Using Bayes Risk for Active Learning in POMDPs Finale - 字舞流文

Reinforcement Learning with Limited Reinforcement : Using Bayes Risk for Active Learning in POMDPs Finale

Joelle Pineau | Nicholas Roy | Nicholas A. Roy | Finale Doshi-Velez | N. Roy | Joelle Pineau | Finale Doshi-Velez | Finale Doshi-Velez | Doshi-Velez

[1] Edward J. Sondik,et al. The optimal control of par-tially observable Markov processes , 1971 .

[2] E. J. Sondik,et al. The Optimal Control of Partially Observable Markov Decision Processes. , 1971 .

[3] C. Watkins. Learning from delayed rewards , 1989 .

[4] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[5] Lonnie Chrisman,et al. Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.

[6] Leslie Pack Kaelbling,et al. Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.

[7] Leslie Pack Kaelbling,et al. Acting under uncertainty: discrete Bayesian models for mobile-robot navigation , 1996, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. IROS '96.

[8] David Andre,et al. Model based Bayesian Exploration , 1999, UAI.

[9] Joelle Pineau,et al. Spoken Dialogue Management Using Probabilistic Reasoning , 2000, ACL.

[10] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.

[11] Milos Hauskrecht,et al. Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..

[12] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[13] S. L. Scott. Bayesian Methods for Hidden Markov Models , 2002 .

[14] P. Moral,et al. Sequential Monte Carlo samplers , 2002, cond-mat/0212648.

[15] Craig Boutilier,et al. A POMDP formulation of preference elicitation problems , 2002, AAAI/IAAI.

[16] Joelle Pineau,et al. Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[17] L. Ghaoui,et al. Robust markov decision processes with uncertain transition matrices , 2004 .

[18] Nikos A. Vlassis,et al. A point-based POMDP algorithm for robot planning , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[19] P. Pu,et al. IC / 2004 / 67 Survey of Preference Elicitation Methods , 2004 .

[20] Joelle Pineau,et al. Active Learning in Partially Observable Markov Decision Processes , 2005, ECML.

[21] Yossi Aviv,et al. A Partially Observed Markov Decision Process for Dynamic Pricing , 2005, Manag. Sci..

[22] Yishay Mansour,et al. Reinforcement Learning in POMDPs Without Resets , 2005, IJCAI.

[23] J.D. Williams,et al. Scaling up POMDPs for Dialog Management: The ``Summary POMDP'' Method , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[24] B. White. An economic analysis of ecological monitoring , 2005 .

[25] Jesse Hoey,et al. POMDP Models for Assistive Technology , 2005, AAAI Fall Symposium: Caring Machines.

[26] Guy Shani,et al. Model-Based Online Learning of POMDPs , 2005, ECML.

[27] Pascal Poupart,et al. Point-Based Value Iteration for Continuous POMDPs , 2006, J. Mach. Learn. Res..

[28] Jesse Hoey,et al. An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.

[29] Lihong Li,et al. Incremental Model-based Learners With Formal Learning-Time Guarantees , 2006, UAI.

[30] Joelle Pineau,et al. Bayes-Adaptive POMDPs , 2007, NIPS.

[31] Guy Shani,et al. Forward Search Value Iteration for POMDPs , 2007, IJCAI.

[32] Hideaki Itoh,et al. Partially observable Markov decision processes with imprecise parameters , 2007, Artif. Intell..

[33] Nicholas Roy,et al. Efficient model learning for dialog management , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[34] Steve J. Young,et al. Partially observable Markov decision processes for spoken dialog systems , 2007, Comput. Speech Lang..

[35] Christophe Andrieu,et al. A tutorial on adaptive MCMC , 2008, Stat. Comput..

[36] Pascal Poupart,et al. Model-based Bayesian Reinforcement Learning in Partially Observable Domains , 2008, ISAIM.

[37] David Hsu,et al. SARSOP: Efficient Point-Based POMDP Planning by Approximating Optimally Reachable Belief Spaces , 2008, Robotics: Science and Systems.

[38] Nicholas Roy,et al. Spoken language interaction with model uncertainty: an adaptive human–robot interaction system , 2008, Connect. Sci..

[39] Joelle Pineau,et al. Bayesian reinforcement learning in continuous POMDPs with application to robot navigation , 2008, 2008 IEEE International Conference on Robotics and Automation.

[40] Joelle Pineau,et al. Reinforcement learning with limited reinforcement: using Bayes risk for active learning in POMDPs , 2008, ICML '08.

[41] Andrew Y. Ng,et al. Near-Bayesian exploration in polynomial time , 2009, ICML '09.

[42] Brett Browning,et al. A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[43] Lihong Li,et al. A Bayesian Sampling Approach to Exploration in Reinforcement Learning , 2009, UAI.

[44] Joelle Pineau,et al. A bayesian reinforcement learning approach for customizing human-robot interfaces , 2009, IUI.

[45] Kee-Eung Kim,et al. Inverse Reinforcement Learning in Partially Observable Environments , 2009, IJCAI.

[46] Manuel Lopes,et al. Active Learning for Reward Estimation in Inverse Reinforcement Learning , 2009, ECML/PKDD.

[47] Masoumeh T. Izadi,et al. Sensitivity Analysis of POMDP Value Functions , 2009, 2009 International Conference on Machine Learning and Applications.

[48] Manuela M. Veloso,et al. Interactive Policy Learning through Confidence-Based Autonomy , 2014, J. Artif. Intell. Res..

[49] Richard L. Lewis,et al. Variance-Based Rewards for Approximate Bayesian Reinforcement Learning , 2010, UAI.

[50] Thomas J. Walsh,et al. Generalizing Apprenticeship Learning across Hypothesis Classes , 2010, ICML.

[51] Peter Stone,et al. Combining manual feedback with subsequent MDP reward signals for reinforcement learning , 2010, AAMAS.

[52] Joel Veness,et al. Monte-Carlo Planning in Large POMDPs , 2010, NIPS.