论文信息 - Learning to Drive a Real Car in 20 Minutes

Learning to Drive a Real Car in 20 Minutes

The paper describes our first experiments on reinforcement learning to steer a real robot car. The applied method, neural fitted Q iteration (NFQ) is purely data-driven based on data directly collected from real-life experiments, i.e. no transition model and no simulation is used. The RL approach is based on learning a neural Q value function, which means that no prior selection of the structure of the control law is required. We demonstrate, that the controller is able to learn a steering task in less than 20 minutes directly on the real car. We consider this as an important step towards the competitive application of neural Q function based RL methods in real-life environments.

Martin A. Riedmiller | Michael Montemerlo | Hendrik Dahlkamp | Michael Montemerlo | Hendrik Dahlkamp

[1] T D Gillespie,et al. Fundamentals of Vehicle Dynamics , 1992 .

[2] Martin A. Riedmiller,et al. A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[3] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.

[4] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..

[5] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[6] Andrew G. Barto,et al. Reinforcement learning , 1998 .

[7] Martin A. Riedmiller,et al. Reinforcement learning on an omnidirectional mobile robot , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[8] Ben Tse,et al. Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.

[9] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[10] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[11] H. Sebastian Seung,et al. Learning to Walk in 20 Minutes , 2005 .

[12] Liming Xiang,et al. Kernel-Based Reinforcement Learning , 2006, ICIC.

[13] Stefan Schaal,et al. Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[14] Martin A. Riedmiller,et al. Neural Reinforcement Learning Controllers for a Real Robot Application , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[15] C.J. Tomlin,et al. Autonomous Automobile Trajectory Tracking for Off-Road Driving: Controller Design, Experimental Validation and Racing , 2007, 2007 American Control Conference.