Bayesian Policy Gradient and Actor-Critic Algorithms
暂无分享,去创建一个
[1] F. e.. Calcul des Probabilités , 1889, Nature.
[2] Karl Johan Åström,et al. Optimal control of Markov processes with incomplete state information , 1965 .
[3] M. Ciletti,et al. The computation and theory of optimal control , 1972 .
[4] David Q. Mayne,et al. Differential dynamic programming , 1972, The Mathematical Gazette.
[5] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..
[6] L. Hasdorff. Gradient Optimization and Nonlinear Control , 1976 .
[7] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .
[8] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[9] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[10] Peter W. Glynn,et al. Proceedings of Ihe 1986 Winter Simulation , 2022 .
[11] Alan Weiss,et al. Sensitivity analysis via likelihood ratios , 1986, WSC '86.
[12] Anthony O'Hagan,et al. Monte Carlo is fundamentally unsound , 1987 .
[13] Alan Weiss,et al. Sensitivity Analysis for Simulations via Likelihood Ratios , 1989, Oper. Res..
[14] Vijaykumar Gullapalli,et al. A stochastic reinforcement learning algorithm for learning real-valued functions , 1990, Neural Networks.
[15] Peter W. Glynn,et al. Likelihood ratio gradient estimation for stochastic systems , 1990, CACM.
[16] A. O'Hagan,et al. Bayes–Hermite quadrature , 1991 .
[17] Vijaykumar Gullapalli,et al. Learning Control Under Extreme Uncertainty , 1992, NIPS.
[18] Eduardo D. Sontag,et al. Neural Networks for Control , 1993 .
[19] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[20] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[21] V. Gullapalli,et al. Acquiring robot skills via reinforcement learning , 1994, IEEE Control Systems.
[22] Shigenobu Kobayashi,et al. Reinforcement Learning by Stochastic Hill Climbing on Discounted Reward , 1995, ICML.
[23] P. Glynn,et al. Likelihood ratio gradient estimation for stochastic recursions , 1995, Advances in Applied Probability.
[24] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[25] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..
[26] Stuart J. Russell. Learning agents for uncertain environments (extended abstract) , 1998, COLT' 98.
[27] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[28] David Haussler,et al. Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.
[29] David Andre,et al. Model based Bayesian Exploration , 1999, UAI.
[30] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[31] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[32] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.
[33] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[34] Lex Weaver,et al. The Optimal Reward Baseline for Gradient-Based Reinforcement Learning , 2001, UAI.
[35] Peter L. Bartlett,et al. Experiments with Infinite-Horizon, Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[36] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[37] Michael O. Duff,et al. Monte-Carlo Algorithms for the Improvement of Finite-State Stochastic Controllers: Application to Bayes-Adaptive Markov Decision Processes , 2001, AISTATS.
[38] Lehel Csató,et al. Sparse On-Line Gaussian Processes , 2002, Neural Computation.
[39] Carl E. Rasmussen,et al. Bayesian Monte Carlo , 2002, NIPS.
[40] Shie Mannor,et al. Sparse Online Greedy Support Vector Regression , 2002, ECML.
[41] Stefan Schaal,et al. Reinforcement Learning for Humanoid Robotics , 2003 .
[42] Jeff G. Schneider,et al. Covariant policy search , 2003, IJCAI 2003.
[43] Shie Mannor,et al. Bayes Meets Bellman: The Gaussian Process Approach to Temporal Difference Learning , 2003, ICML.
[44] Nello Cristianini,et al. Kernel Methods for Pattern Analysis , 2003, ICTAI.
[45] Craig Boutilier,et al. Coordination in multiagent reinforcement learning: a Bayesian approach , 2003, AAMAS '03.
[46] Peter L. Bartlett,et al. Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..
[47] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[48] Tao Wang,et al. Bayesian sparse sampling for on-line reward optimization , 2005, ICML.
[49] Yaakov Engel,et al. Algorithms and representations for reinforcement learning (עם תקציר בעברית, תכן ושער נוסף: אלגוריתמים וייצוגים ללמידה מחיזוקים.; אלגוריתמים וייצוגים ללמידה מחיזוקים.) , 2005 .
[50] Shie Mannor,et al. Reinforcement learning with Gaussian processes , 2005, ICML.
[51] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[52] Jesse Hoey,et al. An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.
[53] Mohammad Ghavamzadeh,et al. Bayesian Policy Gradient Algorithms , 2006, NIPS.
[54] Shalabh Bhatnagar,et al. Incremental Natural Actor-Critic Algorithms , 2007, NIPS.
[55] Mohammad Ghavamzadeh,et al. Bayesian actor-critic algorithms , 2007, ICML '07.
[56] Joelle Pineau,et al. Bayes-Adaptive POMDPs , 2007, NIPS.
[57] Alan Fern,et al. Multi-task reinforcement learning: a hierarchical Bayesian approach , 2007, ICML '07.
[58] Eyal Amir,et al. Bayesian Inverse Reinforcement Learning , 2007, IJCAI.
[59] Sriraam Natarajan,et al. Transfer in variable-reward hierarchical reinforcement learning , 2008, Machine Learning.
[60] Joelle Pineau,et al. Reinforcement learning with limited reinforcement: using Bayes risk for active learning in POMDPs , 2008, ICML '08.
[61] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .
[62] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..
[63] Andrew Y. Ng,et al. Near-Bayesian exploration in polynomial time , 2009, ICML '09.
[64] Lihong Li,et al. A Bayesian Sampling Approach to Exploration in Reinforcement Learning , 2009, UAI.
[65] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.
[66] Richard L. Lewis,et al. Variance-Based Rewards for Approximate Bayesian Reinforcement Learning , 2010, UAI.
[67] Joaquin Quiñonero Candela,et al. Web-Scale Bayesian Click-Through rate Prediction for Sponsored Search Advertising in Microsoft's Bing Search Engine , 2010, ICML.
[68] Alessandro Lazaric,et al. Bayesian Multi-Task Reinforcement Learning , 2010, ICML.
[69] U. Rieder,et al. Markov Decision Processes , 2010 .
[70] Steven L. Scott,et al. A modern Bayesian look at the multi-armed bandit , 2010 .
[71] TaeChoong Chung,et al. Hessian matrix distribution for Bayesian policy gradient reinforcement learning , 2011, Inf. Sci..
[72] J. Grossman. The Likelihood Principle , 2011 .
[73] Kee-Eung Kim,et al. MAP Inference for Bayesian Inverse Reinforcement Learning , 2011, NIPS.
[74] Eduardo F. Morales,et al. An Introduction to Reinforcement Learning , 2011 .
[75] Lihong Li,et al. An Empirical Evaluation of Thompson Sampling , 2011, NIPS.
[76] Rémi Munos,et al. Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.
[77] Jonathan P. How,et al. Improving the efficiency of Bayesian inverse reinforcement learning , 2012, 2012 IEEE International Conference on Robotics and Automation.
[78] Aurélien Garivier,et al. On Bayesian Upper Confidence Bounds for Bandit Problems , 2012, AISTATS.
[79] Kee-Eung Kim,et al. Nonparametric Bayesian Inverse Reinforcement Learning for Multiple Reward Functions , 2012, NIPS.
[80] Jonathan P. How,et al. Bayesian Nonparametric Inverse Reinforcement Learning , 2012, ECML/PKDD.
[81] Shipra Agrawal,et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.
[82] Shipra Agrawal,et al. Further Optimal Regret Bounds for Thompson Sampling , 2012, AISTATS.
[83] Liang Tang,et al. Automatic ad format selection via contextual bandits , 2013, CIKM.
[84] Shipra Agrawal,et al. Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.
[85] Sudipto Guha,et al. Stochastic Regret Minimization via Thompson Sampling , 2014, COLT.
[86] Shie Mannor,et al. Thompson Sampling for Complex Online Problems , 2013, ICML.
[87] Shie Mannor,et al. Bayesian Reinforcement Learning: A Survey , 2015, Found. Trends Mach. Learn..
[88] Lihong Li,et al. On the Prior Sensitivity of Thompson Sampling , 2015, ALT.
[89] GhavamzadehMohammad,et al. Bayesian policy gradient and actor-critic algorithms , 2016 .
[90] Benjamin Van Roy,et al. An Information-Theoretic Analysis of Thompson Sampling , 2014, J. Mach. Learn. Res..