论文信息 - Optimizing High-dimensional Learner with Low-Dimension Action Features

Optimizing High-dimensional Learner with Low-Dimension Action Features

Model-free reinforcement learning is capable of learning high-dimensional robotic tasks, but the requirement of large-scale training data makes it hard to reach better performance in limited time. On the contrary, model-based methods are capable of learning low-dimensional tasks efficiently, but lack of extensibility for complex robotic tasks. It is an instinct that, combining the advantage of both, transferring knowledge to higher dimension may benefit in sample efficiency and model accuracy. In the thesis, we present a hybrid framework that transfer low-dimensional action features to high-dimensional deep reinforcement learning model through imitation learning, in order to decrease the training data needed to reach practical performance. In this work, the hybrid framework is experimented on the simulated locomotion tasks, showing that our framework can improve model-free learning process. Our hybrid algorithm outperforms the pure model-free method, utilizing the low-dimensional action features efficiently and being competent in model accuracy.

Yuhua Tang | Xiaodong Yi | Wenxia Wei | Liyang Xu | Minglong Li

[1] Carl E. Rasmussen,et al. Learning to Control a Low-Cost Manipulator using Data-Efficient Reinforcement Learning , 2011, Robotics: Science and Systems.

[2] Tom Schaul,et al. FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[3] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[4] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[5] Alessandro Lazaric,et al. Transfer in Reinforcement Learning: A Framework and a Survey , 2012, Reinforcement Learning.

[6] Yu Zhang,et al. A Survey on Multi-Task Learning , 2017, IEEE Transactions on Knowledge and Data Engineering.

[7] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[8] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[9] Sergey Levine,et al. Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[10] Peter Stone,et al. Behavioral Cloning from Observation , 2018, IJCAI.

[11] Gregory Dudek,et al. Learning legged swimming gaits from experience , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[12] Peter Stone,et al. Recent Advances in Imitation Learning from Observation , 2019, IJCAI.

[13] Yee Whye Teh,et al. Neural probabilistic motor primitives for humanoid control , 2018, ICLR.

[14] Wolfram Burgard,et al. A Survey of Deep Network Solutions for Learning Control in Robotics: From Reinforcement to Imitation , 2016 .

[15] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[16] Anil V. Rao,et al. ( Preprint ) AAS 09-334 A SURVEY OF NUMERICAL METHODS FOR OPTIMAL CONTROL , 2009 .

[17] Sergey Levine,et al. Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic , 2016, ICLR.

[18] Martin A. Riedmiller,et al. Regularized Hierarchical Policies for Compositional Transfer in Robotics , 2019, ArXiv.