论文信息 - Apprenticeship Bootstrapping

Apprenticeship Bootstrapping

Apprenticeship learning is a learning scheme based on the direct imitation of humans. Inverse reinforcement learning is used to learn a reward function from human data. Coupling Inverse reinforcement learning with reinforcement learning has demonstrated production of human-competitive policies. However, obtaining human subjects with the right level of skills for complex tasks can be a challenge. We propose a new learning scheme called Apprenticeship Bootstrapping to learn a composite task using human demonstrations on sub-tasks. The scenario is a ground-air interaction task with an Unmanned Aerial Vehicle that needs to maintain 3 autonomous Unmanned Ground Vehicles within range of an imaging sensor. For validation, we show that the bootstrapped policy performs as good as a policy learnt from a human performing the composite task. The method offers a clear advantage when skilled humans are available for simpler tasks that form the building blocks for a more complex task, where availability of experts is limited.

[1] Pieter Abbeel,et al. Autonomous Helicopter Aerobatics through Apprenticeship Learning , 2010, Int. J. Robotics Res..

[2] Morgan Quigley,et al. ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[3] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[4] Mohammad A. Jaradat,et al. Reinforcement based mobile robot navigation in dynamic environment , 2011 .

[5] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[6] Marc Carreras,et al. Two-step gradient-based reinforcement learning for underwater robotics behavior learning , 2013, Robotics Auton. Syst..

[7] Xing Zhang,et al. Coordination Between Unmanned Aerial and Ground Vehicles: A Taxonomy and Optimization Perspective , 2016, IEEE Transactions on Cybernetics.

[8] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[9] Tzuu-Hseng S. Li,et al. Backward Q-learning: The combination of Sarsa algorithm and Q-learning , 2013, Eng. Appl. Artif. Intell..

[10] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[11] Chen Xia,et al. A Reinforcement Learning Method of Obstacle Avoidance for Industrial Mobile Vehicles in Unknown Environments Using Neural Network , 2015, IEEM 2015.

[12] Vijay Kumar,et al. Cooperative air and ground surveillance , 2006, IEEE Robotics & Automation Magazine.

[13] Steven L. Waslander. Unmanned Aerial and Ground Vehicle Teams: Recent Work and Open Problems , 2013 .

[14] Randal W. Beard,et al. Probabilistic path planning for cooperative target tracking using aerial and ground vehicles , 2011, Proceedings of the 2011 American Control Conference.

[15] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[16] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[17] Heidar Ali Talebi,et al. UAV-UGVs cooperation: With a moving center based trajectory , 2015, Robotics Auton. Syst..

[18] Young-Jun Son,et al. Vision-Based Target Detection and Localization via a Team of Cooperative UAV and UGVs , 2016, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[19] Er Meng Joo,et al. A review of inverse reinforcement learning theory and recent advances , 2012, IEEE Congress on Evolutionary Computation.

[20] Zoran Miljkovic,et al. Neural network Reinforcement Learning for visual control of robot manipulators , 2013, Expert Syst. Appl..

[21] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.