DA4AD: End-to-end Deep Attention Aware Features Aided Visual Localization for Autonomous Driving

We present a visual localization framework aided by novel deep attention aware features for autonomous driving that achieves centimeter level localization accuracy. Conventional approaches to the visual localization problem rely on handcrafted features or human-made objects on the road. They are known to be either prone to unstable matching caused by severe appearance or lighting changes, or too scarce to deliver constant and robust localization results in challenging scenarios. In this work, we seek to exploit the deep attention mechanism to search for salient, distinctive and stable features that are good for longterm matching in the scene through a novel end-to-end deep neural network. Furthermore, our learned feature descriptors are demonstrated to be competent to establish robust matches and therefore successfully estimate the optimal camera poses with high precision. We comprehensively validate the effectiveness of our method using a freshly collected dataset with high-quality ground truth trajectories and hardware synchronization between sensors. Results demonstrate that our method achieves a competitive localization accuracy when compared to the LiDAR-based localization solutions under various challenging circumstances, leading to a potential low-cost localization solution for autonomous driving.

[1]  Torsten Sattler,et al.  Understanding the Limitations of CNN-Based Absolute Camera Pose Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Ryan M. Eustice,et al.  Robust LIDAR localization using multiresolution Gaussian mixture maps for autonomous driving , 2017, Int. J. Robotics Res..

[3]  Yehoshua Y. Zeevi,et al.  The farthest point strategy for progressive image sampling , 1997, IEEE Trans. Image Process..

[4]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[5]  Vincent Lepetit,et al.  LIFT: Learned Invariant Feature Transform , 2016, ECCV.

[6]  Vincent Lepetit,et al.  BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[7]  Paul Newman,et al.  LAPS - localisation using appearance of prior structure: 6-DoF monocular camera localisation using prior pointclouds , 2012, 2012 IEEE International Conference on Robotics and Automation.

[8]  Sebastian Thrun,et al.  Robust vehicle localization in urban environments using probabilistic maps , 2010, 2010 IEEE International Conference on Robotics and Automation.

[9]  Lu Feng,et al.  A robust pose graph approach for city scale LiDAR mapping , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[10]  Shenhua Hou,et al.  LiDAR Inertial Odometry Aided Robust LiDAR Localization System in Changing City Scenes , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[11]  Hao Wang,et al.  Robust and Precise Vehicle Localization Based on Multi-Sensor Fusion in Diverse City Scenes , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[12]  Torsten Sattler,et al.  Semantic Match Consistency for Long-Term Visual Localization , 2018, ECCV.

[13]  Ryan M. Eustice,et al.  Fast LIDAR localization using multiresolution Gaussian mixture maps , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Stefan Schubert,et al.  Sampling-based methods for visual navigation in 3D maps by synthesizing depth images , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[15]  Jan Kautz,et al.  Geometry-Aware Learning of Maps for Camera Localization , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Hongbin Zha,et al.  Monocular visual localization using road structural features , 2014, 2014 IEEE Intelligent Vehicles Symposium Proceedings.

[17]  Nanning Zheng,et al.  Real-time global localization of intelligent road vehicles in lane-level via lane marking detection and shape registration , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[18]  Paul Newman,et al.  1 year, 1000 km: The Oxford RobotCar dataset , 2017, Int. J. Robotics Res..

[19]  Julius Ziegler,et al.  Video based localization for Bertha , 2014, 2014 IEEE Intelligent Vehicles Symposium Proceedings.

[20]  Renaud Dubé,et al.  VIZARD: Reliable Visual Localization for Autonomous Vehicles in Urban Outdoor Environments , 2019, 2019 IEEE Intelligent Vehicles Symposium (IV).

[21]  Ruigang Yang,et al.  DeLS-3D: Deep Localization and Segmentation with a 3D Semantic Map , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Paul Newman,et al.  Work smart, not hard: Recalling relevant experiences for vast-scale but time-constrained localisation , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[23]  Torsten Sattler,et al.  Quad-Networks: Unsupervised Learning to Rank for Interest Point Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Wolfram Burgard,et al.  VLocNet++: Deep Multitask Learning for Semantic Visual Localization and Odometry , 2018, IEEE Robotics and Automation Letters.

[25]  Henning Lategahn,et al.  Vision-Only Localization , 2014, IEEE Transactions on Intelligent Transportation Systems.

[26]  Hugo Germain,et al.  Sparse-to-Dense Hypercolumn Matching for Long-Term Visual Localization , 2019, 2019 International Conference on 3D Vision (3DV).

[27]  Paul Newman,et al.  Direct Visual Localisation and Calibration for Road Vehicles in Changing City Environments , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[28]  Shenhua Hou,et al.  L3-Net: Towards Learning Based LiDAR Localization for Autonomous Driving , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Paul Newman,et al.  Made to measure: Bespoke landmarks for 24-hour, all-weather localisation with a camera , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[30]  Yasuyuki Matsushita,et al.  Efficient Large-Scale Point Cloud Registration Using Loop Closures , 2015, 2015 International Conference on 3D Vision.

[31]  Paul Newman,et al.  LAPS-II: 6-DoF day and night visual localisation with prior 3D structure for autonomous road vehicles , 2014, 2014 IEEE Intelligent Vehicles Symposium Proceedings.

[32]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[33]  Wolfram Burgard,et al.  Deep Auxiliary Learning for Visual Localization and Odometry , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[34]  Roland Siegwart,et al.  From Coarse to Fine: Robust Hierarchical Localization at Large Scale , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Myoungho Sunwoo,et al.  Precise Localization of an Autonomous Car Based on Probabilistic Noise Models of Road Surface Marker Features Using Multiple Cameras , 2015, IEEE Transactions on Intelligent Transportation Systems.

[37]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[38]  Wolfram Burgard,et al.  Deep regression for monocular camera-based 6-DoF global localization in outdoor environments , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[39]  Raquel Urtasun,et al.  Learning to Localize Using a LiDAR Intensity Map , 2018, CoRL.

[40]  Tao Wu,et al.  Vehicle localization using road markings , 2013, 2013 IEEE Intelligent Vehicles Symposium (IV).

[41]  Torsten Sattler,et al.  InLoc: Indoor Visual Localization with Dense Matching and View Synthesis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42]  Ho Gi Jung,et al.  Sensor Fusion-Based Low-Cost Vehicle Localization System for Complex Urban Environments , 2017, IEEE Transactions on Intelligent Transportation Systems.

[43]  Torsten Sattler,et al.  D2-Net: A Trainable CNN for Joint Description and Detection of Local Features , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Henning Lategahn,et al.  How to learn an illumination robust image feature for place recognition , 2013, 2013 IEEE Intelligent Vehicles Symposium (IV).

[45]  Ryan M. Eustice,et al.  University of Michigan North Campus long-term vision and lidar dataset , 2016, Int. J. Robotics Res..

[46]  Sven Behnke,et al.  Efficient Continuous-Time SLAM for 3D Lidar-Based Online Mapping , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[47]  Yu Chen,et al.  EnforceNet: Monocular Camera Localization in Large Scale Indoor Sparse LiDAR Point Cloud , 2019, ArXiv.

[48]  Yan Lu,et al.  Local Descriptors Optimized for Average Precision , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[49]  Roberto Cipolla,et al.  Geometric Loss Functions for Camera Pose Regression with Deep Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Wolfram Burgard,et al.  Monocular camera localization in 3D LiDAR maps , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[51]  Sebastian Thrun,et al.  Map-Based Precision Vehicle Localization in Urban Environments , 2007, Robotics: Science and Systems.

[52]  Shiyu Song,et al.  DeepVCP: An End-to-End Deep Neural Network for Point Cloud Registration , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[53]  Ryan M. Eustice,et al.  Ford Campus vision and lidar data set , 2011, Int. J. Robotics Res..

[54]  Bohyung Han,et al.  Large-Scale Image Retrieval with Attentive Deep Local Features , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[55]  Julius Ziegler,et al.  Urban localization with camera and inertial measurement unit , 2013, 2013 IEEE Intelligent Vehicles Symposium (IV).

[56]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Nanning Zheng,et al.  Real-Time Global Localization of Robotic Cars in Lane Level via Lane Marking Detection and Shape Registration , 2016, IEEE Transactions on Intelligent Transportation Systems.

[58]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Daniel Cremers,et al.  Image-Based Localization Using LSTMs for Structured Feature Correlation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[60]  Markus Schreiber,et al.  LaneLoc: Lane marking based localization using highly accurate maps , 2013, 2013 IEEE Intelligent Vehicles Symposium (IV).

[61]  Tomasz Malisiewicz,et al.  SuperPoint: Self-Supervised Interest Point Detection and Description , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[62]  Ryan M. Eustice,et al.  Visual localization within LIDAR maps for automated urban driving , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[63]  Torsten Sattler,et al.  Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[64]  Robert M. Haralick,et al.  Review and analysis of solutions of the three point perspective pose estimation problem , 1994, International Journal of Computer Vision.

[65]  Tao Wu,et al.  Light-weight localization for vehicles using road markings , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[66]  Pierre Vandergheynst,et al.  FREAK: Fast Retina Keypoint , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[67]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).