Reinforcement Learning with Limited Reinforcement : Using Bayes Risk for Active Learning in POMDPs Finale

[1]  Edward J. Sondik,et al.  The optimal control of par-tially observable Markov processes , 1971 .

[2]  E. J. Sondik,et al.  The Optimal Control of Partially Observable Markov Decision Processes. , 1971 .

[3]  C. Watkins Learning from delayed rewards , 1989 .

[4]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[5]  Lonnie Chrisman,et al.  Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.

[6]  Leslie Pack Kaelbling,et al.  Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.

[7]  Leslie Pack Kaelbling,et al.  Acting under uncertainty: discrete Bayesian models for mobile-robot navigation , 1996, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. IROS '96.

[8]  David Andre,et al.  Model based Bayesian Exploration , 1999, UAI.

[9]  Joelle Pineau,et al.  Spoken Dialogue Management Using Probabilistic Reasoning , 2000, ACL.

[10]  Malcolm J. A. Strens,et al.  A Bayesian Framework for Reinforcement Learning , 2000, ICML.

[11]  Milos Hauskrecht,et al.  Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..

[12]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[13]  S. L. Scott Bayesian Methods for Hidden Markov Models , 2002 .

[14]  P. Moral,et al.  Sequential Monte Carlo samplers , 2002, cond-mat/0212648.

[15]  Craig Boutilier,et al.  A POMDP formulation of preference elicitation problems , 2002, AAAI/IAAI.

[16]  Joelle Pineau,et al.  Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[17]  L. Ghaoui,et al.  Robust markov decision processes with uncertain transition matrices , 2004 .

[18]  Nikos A. Vlassis,et al.  A point-based POMDP algorithm for robot planning , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[19]  P. Pu,et al.  IC / 2004 / 67 Survey of Preference Elicitation Methods , 2004 .

[20]  Joelle Pineau,et al.  Active Learning in Partially Observable Markov Decision Processes , 2005, ECML.

[21]  Yossi Aviv,et al.  A Partially Observed Markov Decision Process for Dynamic Pricing , 2005, Manag. Sci..

[22]  Yishay Mansour,et al.  Reinforcement Learning in POMDPs Without Resets , 2005, IJCAI.

[23]  J.D. Williams,et al.  Scaling up POMDPs for Dialog Management: The ``Summary POMDP'' Method , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[24]  B. White An economic analysis of ecological monitoring , 2005 .

[25]  Jesse Hoey,et al.  POMDP Models for Assistive Technology , 2005, AAAI Fall Symposium: Caring Machines.

[26]  Guy Shani,et al.  Model-Based Online Learning of POMDPs , 2005, ECML.

[27]  Pascal Poupart,et al.  Point-Based Value Iteration for Continuous POMDPs , 2006, J. Mach. Learn. Res..

[28]  Jesse Hoey,et al.  An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.

[29]  Lihong Li,et al.  Incremental Model-based Learners With Formal Learning-Time Guarantees , 2006, UAI.

[30]  Joelle Pineau,et al.  Bayes-Adaptive POMDPs , 2007, NIPS.

[31]  Guy Shani,et al.  Forward Search Value Iteration for POMDPs , 2007, IJCAI.

[32]  Hideaki Itoh,et al.  Partially observable Markov decision processes with imprecise parameters , 2007, Artif. Intell..

[33]  Nicholas Roy,et al.  Efficient model learning for dialog management , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[34]  Steve J. Young,et al.  Partially observable Markov decision processes for spoken dialog systems , 2007, Comput. Speech Lang..

[35]  Christophe Andrieu,et al.  A tutorial on adaptive MCMC , 2008, Stat. Comput..

[36]  Pascal Poupart,et al.  Model-based Bayesian Reinforcement Learning in Partially Observable Domains , 2008, ISAIM.

[37]  David Hsu,et al.  SARSOP: Efficient Point-Based POMDP Planning by Approximating Optimally Reachable Belief Spaces , 2008, Robotics: Science and Systems.

[38]  Nicholas Roy,et al.  Spoken language interaction with model uncertainty: an adaptive human–robot interaction system , 2008, Connect. Sci..

[39]  Joelle Pineau,et al.  Bayesian reinforcement learning in continuous POMDPs with application to robot navigation , 2008, 2008 IEEE International Conference on Robotics and Automation.

[40]  Joelle Pineau,et al.  Reinforcement learning with limited reinforcement: using Bayes risk for active learning in POMDPs , 2008, ICML '08.

[41]  Andrew Y. Ng,et al.  Near-Bayesian exploration in polynomial time , 2009, ICML '09.

[42]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[43]  Lihong Li,et al.  A Bayesian Sampling Approach to Exploration in Reinforcement Learning , 2009, UAI.

[44]  Joelle Pineau,et al.  A bayesian reinforcement learning approach for customizing human-robot interfaces , 2009, IUI.

[45]  Kee-Eung Kim,et al.  Inverse Reinforcement Learning in Partially Observable Environments , 2009, IJCAI.

[46]  Manuel Lopes,et al.  Active Learning for Reward Estimation in Inverse Reinforcement Learning , 2009, ECML/PKDD.

[47]  Masoumeh T. Izadi,et al.  Sensitivity Analysis of POMDP Value Functions , 2009, 2009 International Conference on Machine Learning and Applications.

[48]  Manuela M. Veloso,et al.  Interactive Policy Learning through Confidence-Based Autonomy , 2014, J. Artif. Intell. Res..

[49]  Richard L. Lewis,et al.  Variance-Based Rewards for Approximate Bayesian Reinforcement Learning , 2010, UAI.

[50]  Thomas J. Walsh,et al.  Generalizing Apprenticeship Learning across Hypothesis Classes , 2010, ICML.

[51]  Peter Stone,et al.  Combining manual feedback with subsequent MDP reward signals for reinforcement learning , 2010, AAMAS.

[52]  Joel Veness,et al.  Monte-Carlo Planning in Large POMDPs , 2010, NIPS.