论文信息 - Deep Learning of Robotic Tasks without a Simulator using Strong and Weak Human Supervision

Deep Learning of Robotic Tasks without a Simulator using Strong and Weak Human Supervision

We propose a scheme for training a computerized agent to perform complex human tasks such as highway steering. The scheme is designed to follow a natural learning process whereby a human instructor teaches a computerized trainee. The learning process consists of five elements: (i) unsupervised feature learning; (ii) supervised imitation learning; (iii) supervised reward induction; (iv) supervised safety module construction; and (v) reinforcement learning. We implemented the last four elements of the scheme using deep convolutional networks and applied it to successfully create a computerized agent capable of autonomous highway steering over the well-known racing game Assetto Corsa. We demonstrate that the use of the last four elements is essential to effectively carry out the steering task using vision alone, without access to a driving simulator internals, and operating in wall-clock time. This is made possible also through the introduction of a safety network, a novel way for preventing the agent from performing catastrophic mistakes during the reinforcement learning stage.

Ran El-Yaniv | Bar Hilleli | Ran El-Yaniv | Bar Hilleli

[1] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[2] Christos Dimitrakakis,et al. TORCS, The Open Racing Car Simulator , 2005 .

[3] G. DeJong,et al. Theory and Application of Reward Shaping in Reinforcement Learning , 2004 .

[4] Sonia Chernova,et al. Using Human Demonstrations to Improve Reinforcement Learning , 2011, AAAI Spring Symposium: Help Me Help You: Bridging the Gaps in Human-Agent Collaboration.

[5] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[6] Honglak Lee,et al. An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[7] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[8] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.

[9] Kyunghyun Cho,et al. Query-Efficient Imitation Learning for End-to-End Autonomous Driving , 2016, ArXiv.

[10] Bidyut Baran Chaudhuri,et al. A survey of Hough Transform , 2015, Pattern Recognit..

[11] Javier García,et al. A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[12] Jianxiong Xiao,et al. DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[13] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[14] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[15] Josef Kittler,et al. A survey of the hough transform , 1988, Comput. Vis. Graph. Image Process..

[16] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[17] Oliver Kroemer,et al. Active Reward Learning , 2014, Robotics: Science and Systems.

[18] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[19] Araceli Sanchis,et al. Controller for TORCS created by imitation , 2009, 2009 IEEE Symposium on Computational Intelligence and Games.

[20] Gillian M. Hayes,et al. A Robot Controller Using Learning by Imitation , 1994 .

[21] Juhan Nam,et al. Multimodal Deep Learning , 2011, ICML.

[22] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[23] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[24] Brett Browning,et al. A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[25] Daniele Loiacono,et al. Learning to overtake in TORCS using simple reinforcement learning , 2010, IEEE Congress on Evolutionary Computation.

[26] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[27] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..