Guide Actor-Critic for Continuous Control
暂无分享,去创建一个
Masashi Sugiyama | Voot Tangkaratt | Abbas Abdolmaleki | A. Abdolmaleki | Masashi Sugiyama | Voot Tangkaratt
[1] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[2] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[3] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[4] Dale Schuurmans,et al. Bridging the Gap Between Value and Policy Based Reinforcement Learning , 2017, NIPS.
[5] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[6] Elman Mansimov,et al. Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation , 2017, NIPS.
[7] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[8] Hany Abdulsamad,et al. Model-Free Trajectory Optimization for Reinforcement Learning , 2016, ICML.
[9] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[10] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.
[11] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..
[12] Hilbert J. Kappen,et al. Dynamic policy programming , 2010, J. Mach. Learn. Res..
[13] David Barber,et al. Approximate Newton Methods for Policy Search in Markov Decision Processes , 2016, J. Mach. Learn. Res..
[14] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[15] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[16] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[17] Emanuel Todorov,et al. Iterative Linear Quadratic Regulator Design for Nonlinear Biological Movement Systems , 2004, ICINCO.
[18] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[19] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[20] Alex M. Andrew,et al. ROBOT LEARNING, edited by Jonathan H. Connell and Sridhar Mahadevan, Kluwer, Boston, 1993/1997, xii+240 pp., ISBN 0-7923-9365-1 (Hardback, 218.00 Guilders, $120.00, £89.95). , 1999, Robotica (Cambridge. Print).
[21] Luís Paulo Reis,et al. Model-Based Relative Entropy Stochastic Search , 2016, NIPS.
[22] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[23] Anind K. Dey,et al. Modeling Interaction via the Principle of Maximum Causal Entropy , 2010, ICML.
[24] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[25] Lei Xu,et al. Input Convex Neural Networks : Supplementary Material , 2017 .
[26] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[27] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[28] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.
[29] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.
[30] Yuval Tassa,et al. Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.
[31] Sergey Levine,et al. Guided Policy Search , 2013, ICML.
[32] Stephen J. Wright,et al. Numerical Optimization , 2018, Fundamental Statistical Inference.
[33] Vijay R. Konda,et al. OnActor-Critic Algorithms , 2003, SIAM J. Control. Optim..