A PAC RL Algorithm for Episodic POMDPs
暂无分享,去创建一个
[1] Naoki Abe,et al. On the computational complexity of approximating distributions by probabilistic automata , 1990, Machine Learning.
[2] Joelle Pineau,et al. A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes , 2011, J. Mach. Learn. Res..
[3] Dean Alderucci. A SPECTRAL ALGORITHM FOR LEARNING HIDDEN MARKOV MODELS THAT HAVE SILENT STATES , 2015 .
[4] Leonid Peshkin,et al. Bounds on Sample Size for Policy Evaluation in Markov Environments , 2001, COLT/EuroCOLT.
[5] Richard S. Sutton,et al. Predictive Representations of State , 2001, NIPS.
[6] Tor Lattimore,et al. PAC Bounds for Discounted MDPs , 2012, ALT.
[7] Andrew Y. Ng,et al. Near-Bayesian exploration in polynomial time , 2009, ICML '09.
[8] Stefanos Nikolaidis,et al. Efficient Model Learning from Joint-Action Demonstrations for Human-Robot Collaborative Tasks , 2015, 2015 10th ACM/IEEE International Conference on Human-Robot Interaction (HRI).
[9] Le Song,et al. Hilbert Space Embeddings of Hidden Markov Models , 2010, ICML.
[10] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[11] Michael L. Littman,et al. A theoretical analysis of Model-Based Interval Estimation , 2005, ICML.
[12] Lihong Li,et al. A Bayesian Sampling Approach to Exploration in Reinforcement Learning , 2009, UAI.
[13] Anima Anandkumar,et al. Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..
[14] Michael L. Littman,et al. An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..
[15] Li Ling Ko,et al. Structured Parameter Elicitation , 2010, AAAI.
[16] Craig Boutilier,et al. A POMDP formulation of preference elicitation problems , 2002, AAAI/IAAI.
[17] E. Ordentlich,et al. Inequalities for the L1 Deviation of the Empirical Distribution , 2003 .
[18] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[19] Alessandro Lazaric,et al. Sequential Transfer in Multi-armed Bandit with Finite Set of Models , 2013, NIPS.
[20] Joelle Pineau,et al. A Variance Analysis for POMDP Policy Evaluation , 2008, AAAI.
[21] Nicholas Roy,et al. Bayesian nonparametric approaches for reinforcement learning in partially observable domains , 2012 .
[22] Christopher Amato,et al. Diagnose and Decide: An Optimal Bayesian Approach , 2012, NIPS 2012.
[23] Kaare Brandt Petersen,et al. The Matrix Cookbook , 2006 .
[24] Anima Anandkumar,et al. A Method of Moments for Mixture Models and Hidden Markov Models , 2012, COLT.
[25] Byron Boots,et al. Closing the learning-planning loop with predictive state representations , 2009, Int. J. Robotics Res..
[26] Yishay Mansour,et al. Reinforcement Learning in POMDPs Without Resets , 2005, IJCAI.
[27] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[28] Lihong Li,et al. Incremental Model-based Learners With Formal Learning-Time Guarantees , 2006, UAI.
[29] Masoumeh T. Izadi,et al. Sensitivity Analysis of POMDP Value Functions , 2009, 2009 International Conference on Machine Learning and Applications.
[30] Joelle Pineau,et al. Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.