Better Sampling Strategy for Locomotion Control Tasks
暂无分享,去创建一个
[1] Guanrong Chen,et al. Fuzzy PID control of a flexible-joint robot arm with uncertainties from time-varying loads , 1997, IEEE Trans. Control. Syst. Technol..
[2] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[3] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[4] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[5] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[6] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[7] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[8] Sebastian Scherer,et al. Improving Stochastic Policy Gradients in Continuous Control with Deep Reinforcement Learning using the Beta Distribution , 2017, ICML.
[9] Günther Palm,et al. Value-Difference Based Exploration: Adaptive Control between Epsilon-Greedy and Softmax , 2011, KI.
[10] Razvan Pascanu,et al. Theano: new features and speed improvements , 2012, ArXiv.
[11] Gillespie,et al. Exact numerical simulation of the Ornstein-Uhlenbeck process and its integral. , 1996, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.
[12] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[13] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.