Compatible natural gradient policy search
暂无分享,去创建一个
Jan Peters | Gerhard Neumann | Joni Pajarinen | Riad Akrour | Hong Linh Thai | Jan Peters | G. Neumann | J. Pajarinen | R. Akrour
[1] Jan Peters,et al. Model-Free Trajectory-based Policy Optimization with Monotonic Improvement , 2016, J. Mach. Learn. Res..
[2] Koray Kavukcuoglu,et al. PGQ: Combining policy gradient and Q-learning , 2016, ArXiv.
[3] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[4] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[5] Hany Abdulsamad,et al. Model-Free Trajectory Optimization for Reinforcement Learning , 2016, ICML.
[6] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[7] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[8] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.
[9] Jan Peters,et al. Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .
[10] Tapani Raiko,et al. International Conference on Learning Representations (ICLR) , 2016 .
[11] Elman Mansimov,et al. Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation , 2017, NIPS.
[12] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[13] Jeff G. Schneider,et al. Covariant policy search , 2003, IJCAI 2003.
[14] Germán Ros,et al. CARLA: An Open Urban Driving Simulator , 2017, CoRL.
[15] Yuval Tassa,et al. Maximum a Posteriori Policy Optimisation , 2018, ICLR.
[16] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[17] Luís Paulo Reis,et al. Model-Based Relative Entropy Stochastic Search , 2016, NIPS.
[18] Matthieu Geist,et al. Revisiting Natural Actor-Critics with Value Function Approximation , 2010, MDAI.
[19] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[20] Masashi Sugiyama,et al. Guide Actor-Critic for Continuous Control , 2017, ICLR.
[21] Jan Peters,et al. Hierarchical Relative Entropy Policy Search , 2014, AISTATS.
[22] Tom Schaul,et al. Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).
[23] Joelle Pineau,et al. Online Planning Algorithms for POMDPs , 2008, J. Artif. Intell. Res..
[24] Nikolaus Hansen,et al. Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.
[25] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[26] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[27] R. Rubinstein. The Cross-Entropy Method for Combinatorial and Continuous Optimization , 1999 .
[28] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[29] François Laviolette,et al. Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..
[30] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[31] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[32] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[33] Guillaume Hennequin,et al. Exact natural gradient in deep linear networks and its application to the nonlinear case , 2018, NeurIPS.
[34] Jan Peters,et al. Policy Search for Motor Primitives in Robotics , 2008, NIPS 2008.