Deep Learning of Robotic Tasks without a Simulator using Strong and Weak Human Supervision

We propose a scheme for training a computerized agent to perform complex human tasks such as highway steering. The scheme is designed to follow a natural learning process whereby a human instructor teaches a computerized trainee. The learning process consists of five elements: (i) unsupervised feature learning; (ii) supervised imitation learning; (iii) supervised reward induction; (iv) supervised safety module construction; and (v) reinforcement learning. We implemented the last four elements of the scheme using deep convolutional networks and applied it to successfully create a computerized agent capable of autonomous highway steering over the well-known racing game Assetto Corsa. We demonstrate that the use of the last four elements is essential to effectively carry out the steering task using vision alone, without access to a driving simulator internals, and operating in wall-clock time. This is made possible also through the introduction of a safety network, a novel way for preventing the agent from performing catastrophic mistakes during the reinforcement learning stage.

[1]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[2]  Christos Dimitrakakis,et al.  TORCS, The Open Racing Car Simulator , 2005 .

[3]  G. DeJong,et al.  Theory and Application of Reward Shaping in Reinforcement Learning , 2004 .

[4]  Sonia Chernova,et al.  Using Human Demonstrations to Improve Reinforcement Learning , 2011, AAAI Spring Symposium: Help Me Help You: Bridging the Gaps in Human-Agent Collaboration.

[5]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[6]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[7]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[8]  Hado van Hasselt,et al.  Double Q-learning , 2010, NIPS.

[9]  Kyunghyun Cho,et al.  Query-Efficient Imitation Learning for End-to-End Autonomous Driving , 2016, ArXiv.

[10]  Bidyut Baran Chaudhuri,et al.  A survey of Hough Transform , 2015, Pattern Recognit..

[11]  Javier García,et al.  A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[12]  Jianxiong Xiao,et al.  DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[13]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[14]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[15]  Josef Kittler,et al.  A survey of the hough transform , 1988, Comput. Vis. Graph. Image Process..

[16]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[17]  Oliver Kroemer,et al.  Active Reward Learning , 2014, Robotics: Science and Systems.

[18]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[19]  Araceli Sanchis,et al.  Controller for TORCS created by imitation , 2009, 2009 IEEE Symposium on Computational Intelligence and Games.

[20]  Gillian M. Hayes,et al.  A Robot Controller Using Learning by Imitation , 1994 .

[21]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[22]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[23]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[24]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[25]  Daniele Loiacono,et al.  Learning to overtake in TORCS using simple reinforcement learning , 2010, IEEE Congress on Evolutionary Computation.

[26]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[27]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..