Adaptive Batch Size for Safe Policy Gradients
暂无分享,去创建一个
[1] Philip S. Thomas,et al. High Confidence Policy Improvement , 2015, ICML.
[2] Amiel Feinstein,et al. Information and information stability of random variables and processes , 1964 .
[3] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[4] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[5] Jan Peters,et al. Policy Search for Motor Primitives in Robotics , 2008, NIPS 2008.
[6] Gang Niu,et al. Analysis and Improvement of Policy Gradient Estimation , 2011, NIPS.
[7] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[8] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .
[9] Quoc V. Le,et al. Don't Decay the Learning Rate, Increase the Batch Size , 2017, ICLR.
[10] Daniele Calandriello,et al. Safe Policy Iteration , 2013, ICML.
[11] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.
[12] Mark W. Schmidt,et al. Coordinate Descent Converges Faster with the Gauss-Southwell Rule Than Random Selection , 2015, ICML.
[13] Bruno Scherrer,et al. Approximate Policy Iteration Schemes: A Comparison , 2014, ICML.
[14] Marek Petrik,et al. Safe Policy Improvement by Minimizing Robust Baseline Regret , 2016, NIPS.
[15] Luca Bascetta,et al. Adaptive Step-Size for Policy Gradient Methods , 2013, NIPS.
[16] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[17] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[18] Csaba Szepesvári,et al. Empirical Bernstein stopping , 2008, ICML '08.
[19] Luca Bascetta,et al. Policy gradient in Lipschitz Markov Decision Processes , 2015, Machine Learning.
[20] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[21] Robert Babuska,et al. A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).
[22] Stephen J. Wright,et al. A Fast and Reliable Policy Improvement Algorithm , 2016, AISTATS.
[23] D. Bertsekas. Approximate policy iteration: a survey and some new methods , 2011 .
[24] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[25] Frank Sehnke,et al. Policy Gradients with Parameter-Based Exploration for Control , 2008, ICANN.