Better Sampling Strategy for Locomotion Control Tasks

Recently, model-free reinforcement learning algorithms such as TRPO for solving locomotion control tasks has achieved great success. But for difficult locomotion problem with high dimensional visual observation, these algorithms are not sample efficient. This paper proposes an OU process sampling strategy for locomotion control tasks. As experimental results show, TRPO algorithm with OU process sampling strategy shows better performance and better convergence compare with TRPO without OU process strategy.

[1]  Guanrong Chen,et al.  Fuzzy PID control of a flexible-joint robot arm with uncertainties from time-varying loads , 1997, IEEE Trans. Control. Syst. Technol..

[2]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[3]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[4]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[5]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[6]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[7]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[8]  Sebastian Scherer,et al.  Improving Stochastic Policy Gradients in Continuous Control with Deep Reinforcement Learning using the Beta Distribution , 2017, ICML.

[9]  Günther Palm,et al.  Value-Difference Based Exploration: Adaptive Control between Epsilon-Greedy and Softmax , 2011, KI.

[10]  Razvan Pascanu,et al.  Theano: new features and speed improvements , 2012, ArXiv.

[11]  Gillespie,et al.  Exact numerical simulation of the Ornstein-Uhlenbeck process and its integral. , 1996, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[12]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[13]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.