PAC-Bayesian Policy Evaluation for Reinforcement Learning
暂无分享,去创建一个
Joelle Pineau | Csaba Szepesvári | Mahdi Milani Fard | Csaba Szepesvari | Joelle Pineau | M. M. Fard
[1] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[2] Shie Mannor,et al. Regularized Policy Iteration , 2008, NIPS.
[3] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[4] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[5] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[6] Michael L. Littman,et al. A theoretical analysis of Model-Based Interval Estimation , 2005, ICML.
[7] Paul-Marie Samson,et al. Concentration of measure inequalities for Markov chains and $\Phi$-mixing processes , 2000 .
[8] Richard L. Tweedie,et al. Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.
[9] Jesse Hoey,et al. An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.
[10] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[11] Tao Wang,et al. Bayesian sparse sampling for on-line reward optimization , 2005, ICML.
[12] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[13] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[14] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[15] Andrew G. Barto,et al. Optimal learning: computational procedures for bayes-adaptive markov decision processes , 2002 .
[16] S. Boucheron,et al. Theory of classification : a survey of some recent advances , 2005 .
[17] Leslie G. Valiant,et al. A theory of the learnable , 1984, CACM.
[18] François Laviolette,et al. PAC-Bayesian learning of linear classifiers , 2009, ICML '09.
[19] John Shawe-Taylor,et al. A PAC analysis of a Bayesian estimator , 1997, COLT '97.
[20] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.
[21] Andrew Y. Ng,et al. Near-Bayesian exploration in polynomial time , 2009, ICML '09.
[22] David A. McAllester. Some PAC-Bayesian Theorems , 1998, COLT' 98.
[23] Joelle Pineau,et al. PAC-Bayesian Model Selection for Reinforcement Learning , 2010, NIPS.
[24] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[25] John Shawe-Taylor,et al. PAC-Bayesian Analysis of Martingales and Multiarmed Bandits , 2011, ArXiv.
[26] John Shawe-Taylor,et al. PAC-Bayesian Analysis of the Exploration-Exploitation Trade-off , 2011, ICML 2011.