A Model-Based Method for Learning Locomotion Skills from Demonstration
暂无分享,去创建一个
While Generative Adversarial Imitation Learning (GAIL) shows remarkable performance in many high dimensional imitation learning tasks, it requires too many sampled transitions, which are infeasible for some real world problems. In this paper, we demonstrate how exploiting the reward function in GAIL can improve sample efficiency. We design our algorithm end-to-end differentiable so that the learned reward function can directly participate in policy updates. End-to-end differentiability can be achieved by introducing a forward model of the environment, enabling direct calculation of the cumulative reward function. However, using a forward model has two significant limitations that it heavily relies on the performance of the forward model and requires multi-step prediction, which causes severe error accumulation. The proposed end-to-end differentiable adversarial imitation learning algorithm alleviates these limitations. Also, we suggest applying several existing regularization techniques for robust training of a forward model. We call our algorithm, integrated with these regularization methods, fully Differentiable Regularized GAIL (DRGAIL), and test DRGAIL on continuous control tasks.