DA4AD: End-to-End Deep Attention-Based Visual Localization for Autonomous Driving

We present a visual localization framework based on novel deep attention aware features for autonomous driving that achieves centimeter level localization accuracy. Conventional approaches to the visual localization problem rely on handcrafted features or human-made objects on the road. They are known to be either prone to unstable matching caused by severe appearance or lighting changes, or too scarce to deliver constant and robust localization results in challenging scenarios. In this work, we seek to exploit the deep attention mechanism to search for salient, distinctive and stable features that are good for long-term matching in the scene through a novel end-to-end deep neural network. Furthermore, our learned feature descriptors are demonstrated to be competent to establish robust matches and therefore successfully estimate the optimal camera poses with high precision. We comprehensively validate the effectiveness of our method using a freshly collected dataset with high-quality ground truth trajectories and hardware synchronization between sensors. Results demonstrate that our method achieves a competitive localization accuracy when compared to the LiDAR-based localization solutions under various challenging circumstances, leading to a potential low-cost localization solution for autonomous driving.

[1]  Lu Feng,et al.  A robust pose graph approach for city scale LiDAR mapping , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[2]  Ryan M. Eustice,et al.  Fast LIDAR localization using multiresolution Gaussian mixture maps , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[3]  Wolfram Burgard,et al.  Deep Auxiliary Learning for Visual Localization and Odometry , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[4]  Sven Behnke,et al.  Efficient Continuous-Time SLAM for 3D Lidar-Based Online Mapping , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[5]  Myoungho Sunwoo,et al.  Precise Localization of an Autonomous Car Based on Probabilistic Noise Models of Road Surface Marker Features Using Multiple Cameras , 2015, IEEE Transactions on Intelligent Transportation Systems.

[6]  Yu Chen,et al.  EnforceNet: Monocular Camera Localization in Large Scale Indoor Sparse LiDAR Point Cloud , 2019, ArXiv.

[7]  Ruigang Yang,et al.  DeLS-3D: Deep Localization and Segmentation with a 3D Semantic Map , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[9]  Torsten Sattler,et al.  Understanding the Limitations of CNN-Based Absolute Camera Pose Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Ryan M. Eustice,et al.  Robust LIDAR localization using multiresolution Gaussian mixture maps for autonomous driving , 2017, Int. J. Robotics Res..

[11]  Sebastian Thrun,et al.  Map-Based Precision Vehicle Localization in Urban Environments , 2007, Robotics: Science and Systems.

[12]  Yan Lu,et al.  Local Descriptors Optimized for Average Precision , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Nanning Zheng,et al.  Real-Time Global Localization of Robotic Cars in Lane Level via Lane Marking Detection and Shape Registration , 2016, IEEE Transactions on Intelligent Transportation Systems.

[14]  Matthew Gadd,et al.  Real-time Kinematic Ground Truth for the Oxford RobotCar Dataset , 2020, ArXiv.

[15]  Tao Wu,et al.  Vehicle localization using road markings , 2013, 2013 IEEE Intelligent Vehicles Symposium (IV).

[16]  Torsten Sattler,et al.  D2-Net: A Trainable CNN for Joint Detection and Description of Local Features , 2019, CVPR 2019.

[17]  Torsten Sattler,et al.  Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Robert M. Haralick,et al.  Review and analysis of solutions of the three point perspective pose estimation problem , 1994, International Journal of Computer Vision.

[19]  Tao Wu,et al.  Light-weight localization for vehicles using road markings , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[20]  Bohyung Han,et al.  Large-Scale Image Retrieval with Attentive Deep Local Features , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Pierre Vandergheynst,et al.  FREAK: Fast Retina Keypoint , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Nanning Zheng,et al.  Real-time global localization of intelligent road vehicles in lane-level via lane marking detection and shape registration , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[23]  Shiyu Song,et al.  DeepVCP: An End-to-End Deep Neural Network for Point Cloud Registration , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[24]  Ryan M. Eustice,et al.  Ford Campus vision and lidar data set , 2011, Int. J. Robotics Res..

[25]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[26]  Paul Newman,et al.  1 year, 1000 km: The Oxford RobotCar dataset , 2017, Int. J. Robotics Res..

[27]  Vincent Lepetit,et al.  LIFT: Learned Invariant Feature Transform , 2016, ECCV.

[28]  Renaud Dubé,et al.  VIZARD: Reliable Visual Localization for Autonomous Vehicles in Urban Outdoor Environments , 2019, 2019 IEEE Intelligent Vehicles Symposium (IV).

[29]  Paul Newman,et al.  Made to measure: Bespoke landmarks for 24-hour, all-weather localisation with a camera , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[30]  Daniel Cremers,et al.  Image-Based Localization Using LSTMs for Structured Feature Correlation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[31]  Stefan Schubert,et al.  Sampling-based methods for visual navigation in 3D maps by synthesizing depth images , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[32]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[33]  Markus Schreiber,et al.  LaneLoc: Lane marking based localization using highly accurate maps , 2013, 2013 IEEE Intelligent Vehicles Symposium (IV).

[34]  Hongbin Zha,et al.  Monocular visual localization using road structural features , 2014, 2014 IEEE Intelligent Vehicles Symposium Proceedings.

[35]  Raquel Urtasun,et al.  Learning to Localize Using a LiDAR Intensity Map , 2018, CoRL.

[36]  Wolfram Burgard,et al.  Deep regression for monocular camera-based 6-DoF global localization in outdoor environments , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[37]  Wolfram Burgard,et al.  Monocular camera localization in 3D LiDAR maps , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[38]  Paul Newman,et al.  LAPS - localisation using appearance of prior structure: 6-DoF monocular camera localisation using prior pointclouds , 2012, 2012 IEEE International Conference on Robotics and Automation.

[39]  Hao Wang,et al.  Robust and Precise Vehicle Localization Based on Multi-Sensor Fusion in Diverse City Scenes , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[40]  Torsten Sattler,et al.  Semantic Match Consistency for Long-Term Visual Localization , 2018, ECCV.

[41]  Vincent Lepetit,et al.  BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[42]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[43]  Yehoshua Y. Zeevi,et al.  The farthest point strategy for progressive image sampling , 1997, IEEE Trans. Image Process..

[44]  Shenhua Hou,et al.  LiDAR Inertial Odometry Aided Robust LiDAR Localization System in Changing City Scenes , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[45]  Yasuyuki Matsushita,et al.  Efficient Large-Scale Point Cloud Registration Using Loop Closures , 2015, 2015 International Conference on 3D Vision.

[46]  Paul Newman,et al.  LAPS-II: 6-DoF day and night visual localisation with prior 3D structure for autonomous road vehicles , 2014, 2014 IEEE Intelligent Vehicles Symposium Proceedings.

[47]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Hugo Germain,et al.  Sparse-to-Dense Hypercolumn Matching for Long-Term Visual Localization , 2019, 2019 International Conference on 3D Vision (3DV).

[49]  Paul Newman,et al.  Direct Visual Localisation and Calibration for Road Vehicles in Changing City Environments , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[50]  Tomasz Malisiewicz,et al.  SuperPoint: Self-Supervised Interest Point Detection and Description , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[51]  Ryan M. Eustice,et al.  Visual localization within LIDAR maps for automated urban driving , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[52]  Daniel Cremers,et al.  GN-Net: The Gauss-Newton Loss for Multi-Weather Relocalization , 2020, IEEE Robotics and Automation Letters.

[53]  Sebastian Thrun,et al.  Robust vehicle localization in urban environments using probabilistic maps , 2010, 2010 IEEE International Conference on Robotics and Automation.

[54]  Jan Kautz,et al.  Geometry-Aware Learning of Maps for Camera Localization , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[55]  Paul Newman,et al.  Work smart, not hard: Recalling relevant experiences for vast-scale but time-constrained localisation , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[56]  Julius Ziegler,et al.  Video based localization for Bertha , 2014, 2014 IEEE Intelligent Vehicles Symposium Proceedings.

[57]  Ho Gi Jung,et al.  Sensor Fusion-Based Low-Cost Vehicle Localization System for Complex Urban Environments , 2017, IEEE Transactions on Intelligent Transportation Systems.

[58]  Torsten Sattler,et al.  InLoc: Indoor Visual Localization with Dense Matching and View Synthesis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[59]  Torsten Sattler,et al.  D2-Net: A Trainable CNN for Joint Description and Detection of Local Features , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Henning Lategahn,et al.  How to learn an illumination robust image feature for place recognition , 2013, 2013 IEEE Intelligent Vehicles Symposium (IV).

[61]  Ryan M. Eustice,et al.  University of Michigan North Campus long-term vision and lidar dataset , 2016, Int. J. Robotics Res..

[62]  Julius Ziegler,et al.  Urban localization with camera and inertial measurement unit , 2013, 2013 IEEE Intelligent Vehicles Symposium (IV).

[63]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  Torsten Sattler,et al.  Quad-Networks: Unsupervised Learning to Rank for Interest Point Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  Wolfram Burgard,et al.  VLocNet++: Deep Multitask Learning for Semantic Visual Localization and Odometry , 2018, IEEE Robotics and Automation Letters.

[66]  Henning Lategahn,et al.  Vision-Only Localization , 2014, IEEE Transactions on Intelligent Transportation Systems.

[67]  Roland Siegwart,et al.  From Coarse to Fine: Robust Hierarchical Localization at Large Scale , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[68]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[69]  Roberto Cipolla,et al.  Geometric Loss Functions for Camera Pose Regression with Deep Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[70]  Shenhua Hou,et al.  L3-Net: Towards Learning Based LiDAR Localization for Autonomous Driving , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).