Global visual localization in LiDAR-maps through shared 2D-3D embedding space

Global localization is an important and widely studied problem for many robotic applications. Place recognition approaches can be exploited to solve this task, e.g., in the autonomous driving field. While most vision-based approaches match an image w.r.t. an image database, global visual localization within LiDAR-maps remains fairly unexplored, even though the path toward high definition 3D maps, produced mainly from LiDARs, is clear. In this work we leverage Deep Neural Network (DNN) approaches to create a shared embedding space between images and LiDAR-maps, allowing for image to 3D-LiDAR place recognition. We trained a 2D and a 3D DNN that create embeddings, respectively from images and from point clouds, that are close to each other whether they refer to the same place. An extensive experimental activity is presented to assess the effectiveness of the approach w.r.t. different learning paradigms, network architectures, and loss functions. All the evaluations have been performed using the Oxford Robotcar Dataset, which encompasses a wide range of weather and light conditions.

[1]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[2]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3]  Marcelo H. Ang,et al.  2D3D-Matchnet: Learning To Match Keypoints Across 2D Image And 3D Point Cloud , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[4]  Wolfram Burgard,et al.  CMRNet: Camera to LiDAR-Map Registration , 2019, 2019 IEEE Intelligent Transportation Systems Conference (ITSC).

[5]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[7]  Yue Wang,et al.  Dynamic Graph CNN for Learning on Point Clouds , 2018, ACM Trans. Graph..

[8]  Tomás Pajdla,et al.  NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Ryan M. Eustice,et al.  Visual localization within LIDAR maps for automated urban driving , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[10]  Jiri Matas,et al.  Total recall II: Query expansion revisited , 2011, CVPR 2011.

[11]  Andrew W. Fitzgibbon,et al.  Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[13]  Hesheng Wang,et al.  LPD-Net: 3D Point Cloud Learning for Large-Scale Place Recognition and Environment Analysis , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Vincent Frémont,et al.  Mapping and localization using GPS, lane markings and proprioceptive sensors , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[15]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Chunxia Xiao,et al.  PCAN: 3D Attention Map Learning Using Contextual Information for Point Cloud Based Retrieval , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Ronen Basri,et al.  A Survey on Structure from Motion , 2017, ArXiv.

[19]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[20]  Xiaolong Hu,et al.  Autonomous Driving in the iCity—HD Maps as a Key Challenge of the Automotive Industry , 2016 .

[21]  Augusto Luis Ballardini,et al.  Visual Localization at Intersections with Digital Maps , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[22]  Wolfram Burgard,et al.  VLocNet++: Deep Multitask Learning for Semantic Visual Localization and Odometry , 2018, IEEE Robotics and Automation Letters.

[23]  Ian D. Reid,et al.  Article in Press Robotics and Autonomous Systems ( ) – Robotics and Autonomous Systems a Comparison of Loop Closing Techniques in Monocular Slam , 2022 .

[24]  Silvio Savarese,et al.  Deep Metric Learning via Lifted Structured Feature Embedding , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[26]  Wolfram Burgard,et al.  Monocular camera localization in 3D LiDAR maps , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[27]  Kihyuk Sohn,et al.  Improved Deep Metric Learning with Multi-class N-pair Loss Objective , 2016, NIPS.

[28]  Bo Li,et al.  SECOND: Sparsely Embedded Convolutional Detection , 2018, Sensors.

[29]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Augusto Luis Ballardini,et al.  Leveraging the OSM building data to enhance the localization of an urban vehicle , 2016, 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC).

[31]  Paul Newman,et al.  SLAM-Loop Closing with Visually Salient Features , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[32]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Markus Schreiber,et al.  LaneLoc: Lane marking based localization using highly accurate maps , 2013, 2013 IEEE Intelligent Vehicles Symposium (IV).

[34]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[35]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[36]  Yu Zhong,et al.  Intrinsic shape signatures: A shape descriptor for 3D object recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[37]  Gim Hee Lee,et al.  PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Paul Newman,et al.  1 year, 1000 km: The Oxford RobotCar dataset , 2017, Int. J. Robotics Res..