暂无分享,去创建一个
Marc Toussaint | Christian Daniel | Sebastian Trimpe | Andreas Doerr | Michael Volpp | Christian Daniel | Marc Toussaint | Michael Volpp | Andreas Doerr | Sebastian Trimpe
[1] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[2] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[3] Andreas Krause,et al. Advances in Neural Information Processing Systems (NIPS) , 2014 .
[4] Larry Rudolph,et al. Are Deep Policy Gradient Algorithms Truly Policy Gradient Algorithms? , 2018, ArXiv.
[5] P. Wawrzynski,et al. Truncated Importance Sampling for Reinforcement Learning with Experience Replay , 2007 .
[6] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[7] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[8] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[9] Zhanxing Zhu,et al. Neural Information Processing Systems (NIPS) , 2015 .
[10] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[11] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.
[12] Jasper Snoek,et al. Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.
[13] Jun Morimoto,et al. Efficient Sample Reuse in Policy Gradients with Parameter-Based Exploration , 2012, Neural Computation.
[14] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[15] Leonid Peshkin,et al. Learning from Scarce Experience , 2002, ICML.
[16] Marcello Restelli,et al. Policy Optimization via Importance Sampling , 2018, NeurIPS.
[17] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[18] Marcin Andrychowicz,et al. Parameter Space Noise for Exploration , 2017, ICLR.
[19] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[20] Christian R. Shelton,et al. Policy Improvement for POMDPs Using Normalized Importance Sampling , 2001, UAI.
[21] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[22] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[23] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[24] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[25] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .
[26] Leslie Pack Kaelbling,et al. Off-Policy Policy Search , 2007 .
[27] Rémi Munos,et al. Policy Gradient in Continuous Time , 2006, J. Mach. Learn. Res..
[28] Frank Sehnke,et al. Policy Gradients with Parameter-Based Exploration for Control , 2008, ICANN.
[29] Peter L. Bartlett,et al. Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..
[30] Pieter Abbeel,et al. On a Connection between Importance Sampling and the Likelihood Ratio Policy Gradient , 2010, NIPS.
[31] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[32] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.