Exploiting the Sign of the Advantage Function to Learn Deterministic Policies in Continuous Domains
暂无分享,去创建一个
[1] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[2] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.
[3] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[4] Dock Bumpers,et al. Volume 2 , 2005, Proceedings of the Ninth International Conference on Computer Supported Cooperative Work in Design, 2005..
[5] Thomas G. Dietterich,et al. In Advances in Neural Information Processing Systems 12 , 1991, NIPS 1991.
[6] Alexander J. Smola,et al. Neural Information Processing Systems , 1997, NIPS 1997.
[7] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[8] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[9] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[10] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[11] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[12] Kilian Q. Weinberger,et al. Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48 , 2016 .
[13] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[14] Pierre-Yves Oudeyer,et al. GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning Algorithms , 2017, ICML.
[15] Yann Boniface,et al. Developmental Reinforcement Learning through Sensorimotor Space Enlargement , 2018, 2018 Joint IEEE 8th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob).
[16] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[17] Louis Weinberg,et al. Automation and Remote Control , 1957 .
[18] Yann Boniface,et al. Neural fitted actor-critic , 2016, ESANN.
[19] Peter Tino,et al. IEEE Transactions on Neural Networks , 2009 .
[20] 山田 祐,et al. Open Dynamics Engine を用いたスノーボードロボットシミュレータの開発 , 2007 .
[21] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[22] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[23] M.A. Wiering,et al. Reinforcement Learning in Continuous Action Spaces , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[24] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[25] Kathleen Steinhöfel,et al. European Symposium on Artificial Neural Networks , 2001 .