ADAPS: Autonomous Driving Via Principled Simulations

Autonomous driving has gained significant advancements in recent years. However, obtaining a robust control policy for driving remains challenging as it requires training data from a variety of scenarios, including rare situations (e.g., accidents), an effective policy architecture, and an efficient learning mechanism. We propose ADAPS for producing robust control policies for autonomous vehicles. ADAPS consists of two simulation platforms in generating and analyzing accidents to automatically produce labeled training data, and a memoryenabled hierarchical control policy. Additionally, ADAPS offers a more efficient online learning mechanism that reduces the number of iterations required in learning compared to existing methods such as DAGGER [1]. We present both theoretical and experimental results. The latter are produced in simulated environments, where qualitative and quantitative results are generated to demonstrate the benefits of ADAPS.

[1]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[2]  John Langford,et al.  Search-based structured prediction , 2009, Machine Learning.

[3]  Yann LeCun,et al.  Off-Road Obstacle Avoidance through End-to-End Learning , 2005, NIPS.

[4]  Yishay Mansour,et al.  A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.

[5]  Robert E. Schapire,et al.  A Reduction from Apprenticeship Learning to Classification , 2010, NIPS.

[6]  Jeff G. Schneider,et al.  Policy Search by Dynamic Programming , 2003, NIPS.

[7]  Jianxiong Xiao,et al.  DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[8]  Daniel V. McGehee,et al.  Driver Reaction Time in Crash Avoidance Research: Validation of a Driving Simulator Study on a Test Track , 2000 .

[9]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[10]  Jean-Claude Latombe,et al.  An Approach to Automatic Robot Programming Based on Inductive Learning , 1984 .

[11]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[12]  Csaba Szepesvári,et al.  Finite time bounds for sampling based fitted value iteration , 2005, ICML.

[13]  Ming C. Lin,et al.  City-scale traffic animation using statistical learning and metamodel-based optimization , 2017, ACM Trans. Graph..

[14]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[15]  Sergey Levine,et al.  Guided Policy Search , 2013, ICML.

[16]  Nathan Ratliff,et al.  Learning to search: structured prediction techniques for imitation learning , 2009 .

[17]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[18]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[19]  Ming C. Lin,et al.  WarpDriver: context-aware probabilistic motion prediction for crowd simulation , 2016, ACM Trans. Graph..

[20]  G. Johansson,et al.  Drivers' Brake Reaction Times , 1971, Human factors.

[21]  Kyunghyun Cho,et al.  Query-Efficient Imitation Learning for End-to-End Simulated Driving , 2017, AAAI.

[22]  Ambuj Tewari,et al.  On the Generalization Ability of Online Strongly Convex Programming Algorithms , 2008, NIPS.

[23]  Ming C. Lin,et al.  A Survey on Visual Traffic Simulation: Models, Evaluations, and Applications in Autonomous Driving , 2019, Comput. Graph. Forum.

[24]  Scott Kuindersma,et al.  Robot learning from demonstration by constructing skill trees , 2012, Int. J. Robotics Res..

[25]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[26]  David Silver,et al.  Learning Preference Models for Autonomous Mobile Robots in Complex Domains , 2010 .

[27]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[28]  Byron Boots,et al.  Agile Off-Road Autonomous Driving Using End-to-End Deep Imitation Learning , 2017, ArXiv.

[29]  Elad Hazan,et al.  Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[30]  Alexey Dosovitskiy,et al.  End-to-End Driving Via Conditional Imitation Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[31]  Geoffrey J. Gordon Stable Function Approximation in Dynamic Programming , 1995, ICML.

[32]  John Langford,et al.  Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.

[33]  Martial Hebert,et al.  Learning monocular reactive UAV control in cluttered natural environments , 2012, 2013 IEEE International Conference on Robotics and Automation.

[34]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[35]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[36]  David M. Bradley,et al.  Boosting Structured Prediction for Imitation Learning , 2006, NIPS.

[37]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[38]  Yang Gao,et al.  End-to-End Learning of Driving Models from Large-Scale Video Datasets , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[40]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.