Perceive, Attend, and Drive: Learning Spatial Attention for Safe Self-Driving

In this paper, we propose an end-to-end self-driving network featuring a sparse attention module that learns to automatically attend to important regions of the input. The attention module specifically targets motion planning, whereas prior literature only applied attention in perception tasks. Learning an attention mask directly targeted for motion planning significantly improves the planner safety by performing more focused computation. Furthermore, visualizing the attention improves interpretability of end-to-end self-driving.

[1]  R. Urtasun,et al.  PnPNet: End-to-End Perception and Prediction With Tracking in the Loop , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Ersin Yumer,et al.  Jointly Learnable Behavior and Trajectory Planning for Self-Driving Vehicles , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[3]  Kai Ding,et al.  Agent Prioritization for Autonomous Navigation , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[4]  Sergio Casas,et al.  End-To-End Interpretable Neural Motion Planner , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Qiang Xu,et al.  nuScenes: A Multimodal Dataset for Autonomous Driving , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Jie Gu,et al.  Progressive Sparse Local Attention for Video Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[7]  Matthias Althoff,et al.  Computationally Efficient Fail-safe Trajectory Planning for Self-driving Vehicles Using Convex Optimization , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[8]  Ömer Sahin Tas,et al.  Decision- Time Postponing Motion Planning for Combinatorial Uncertain Maneuvering , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[9]  Li Fei-Fei,et al.  Dynamic Task Prioritization for Multitask Learning , 2018, ECCV.

[10]  Marco Pavone,et al.  Risk-Sensitive Generative Adversarial Imitation Learning , 2018, AISTATS.

[11]  Changchun Liu,et al.  Baidu Apollo EM Motion Planner , 2018, ArXiv.

[12]  Martin Lauer,et al.  Tackling Occlusions & Limited Sensor Range with Set-based Safety Verification , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[13]  Andrew J. Davison,et al.  End-To-End Multi-Task Learning With Attention , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Bin Yang,et al.  Learning to Reweight Examples for Robust Deep Learning , 2018, ICML.

[15]  Bin Yang,et al.  SBNet: Sparse Blocks Network for Fast Inference , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Laurens van der Maaten,et al.  3D Semantic Segmentation with Submanifold Sparse Convolutional Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Marco Pavone,et al.  Risk-sensitive inverse reinforcement learning via semi- and non-parametric methods , 2017, Int. J. Robotics Res..

[18]  Zhao Chen,et al.  GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks , 2017, ICML.

[19]  Zhiqiang Shen,et al.  Learning Efficient Convolutional Networks through Network Slimming , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[20]  Amnon Shashua,et al.  On a Formal Model of Safe and Scalable Self-driving Cars , 2017, ArXiv.

[21]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[22]  Roberto Cipolla,et al.  Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Patrick Judd,et al.  Cnvlutin2: Ineffectual-Activation-and-Weight-Free Deep Neural Network Computing , 2017, ArXiv.

[24]  Shaohuai Shi,et al.  Speeding up Convolutional Neural Networks By Exploiting the Sparsity of Rectifier Units , 2017, ArXiv.

[25]  Xiaogang Wang,et al.  Residual Attention Network for Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Xiaoxiao Li,et al.  Not All Pixels Are Equal: Difficulty-Aware Semantic Segmentation via Deep Layer Cascade , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Li Zhang,et al.  Spatially Adaptive Computation Time for Residual Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[30]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[31]  Jiasen Lu,et al.  Hierarchical Question-Image Co-Attention for Visual Question Answering , 2016, NIPS.

[32]  Richard S. Zemel,et al.  End-to-End Instance Segmentation with Recurrent Attention , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Alexander J. Smola,et al.  Stacked Attention Networks for Image Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Hassan Foroosh,et al.  Sparse Convolutional Neural Networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[36]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[37]  Koray Kavukcuoglu,et al.  Visual Attention , 2020, Computational Models for Cognitive Vision.

[38]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[39]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[40]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[41]  Geoffrey E. Hinton,et al.  Learning to combine foveal glimpses with a third-order Boltzmann machine , 2010, NIPS.

[42]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[43]  John K. Tsotsos,et al.  Neurobiology of Attention , 2005 .

[44]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[45]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[46]  Sanjiv Singh,et al.  Path Generation for Robot Vehicles Using Composite Clothoid Segments , 1990 .

[47]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[48]  David Held,et al.  Adaptive Auxiliary Task Weighting for Reinforcement Learning , 2019, NeurIPS.