Apprenticeship Bootstrapping

Apprenticeship learning is a learning scheme based on the direct imitation of humans. Inverse reinforcement learning is used to learn a reward function from human data. Coupling Inverse reinforcement learning with reinforcement learning has demonstrated production of human-competitive policies. However, obtaining human subjects with the right level of skills for complex tasks can be a challenge. We propose a new learning scheme called Apprenticeship Bootstrapping to learn a composite task using human demonstrations on sub-tasks. The scenario is a ground-air interaction task with an Unmanned Aerial Vehicle that needs to maintain 3 autonomous Unmanned Ground Vehicles within range of an imaging sensor. For validation, we show that the bootstrapped policy performs as good as a policy learnt from a human performing the composite task. The method offers a clear advantage when skilled humans are available for simpler tasks that form the building blocks for a more complex task, where availability of experts is limited.

[1]  Pieter Abbeel,et al.  Autonomous Helicopter Aerobatics through Apprenticeship Learning , 2010, Int. J. Robotics Res..

[2]  Morgan Quigley,et al.  ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[3]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[4]  Mohammad A. Jaradat,et al.  Reinforcement based mobile robot navigation in dynamic environment , 2011 .

[5]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[6]  Marc Carreras,et al.  Two-step gradient-based reinforcement learning for underwater robotics behavior learning , 2013, Robotics Auton. Syst..

[7]  Xing Zhang,et al.  Coordination Between Unmanned Aerial and Ground Vehicles: A Taxonomy and Optimization Perspective , 2016, IEEE Transactions on Cybernetics.

[8]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[9]  Tzuu-Hseng S. Li,et al.  Backward Q-learning: The combination of Sarsa algorithm and Q-learning , 2013, Eng. Appl. Artif. Intell..

[10]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[11]  Chen Xia,et al.  A Reinforcement Learning Method of Obstacle Avoidance for Industrial Mobile Vehicles in Unknown Environments Using Neural Network , 2015, IEEM 2015.

[12]  Vijay Kumar,et al.  Cooperative air and ground surveillance , 2006, IEEE Robotics & Automation Magazine.

[13]  Steven L. Waslander Unmanned Aerial and Ground Vehicle Teams: Recent Work and Open Problems , 2013 .

[14]  Randal W. Beard,et al.  Probabilistic path planning for cooperative target tracking using aerial and ground vehicles , 2011, Proceedings of the 2011 American Control Conference.

[15]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[16]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[17]  Heidar Ali Talebi,et al.  UAV-UGVs cooperation: With a moving center based trajectory , 2015, Robotics Auton. Syst..

[18]  Young-Jun Son,et al.  Vision-Based Target Detection and Localization via a Team of Cooperative UAV and UGVs , 2016, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[19]  Er Meng Joo,et al.  A review of inverse reinforcement learning theory and recent advances , 2012, IEEE Congress on Evolutionary Computation.

[20]  Zoran Miljkovic,et al.  Neural network Reinforcement Learning for visual control of robot manipulators , 2013, Expert Syst. Appl..

[21]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.