Convolutional neural network-based coarse initial position estimation of a monocular camera in large-scale 3D light detection and ranging maps

Initial position estimation in global maps, which is a prerequisite for accurate localization, plays a critical role in mobile robot navigation tasks. Global positioning system signals often become unreliable in disaster sites or indoor areas, which require other localization methods to help the robot in searching and rescuing. Many visual-based approaches focus on estimating a robot’s position within prior maps acquired with cameras. In contrast to conventional methods that need a coarse estimation of initial position to precisely localize a camera in a given map, we propose a novel approach that estimates the initial position of a monocular camera within a given 3D light detection and ranging map using a convolutional neural network with no retraining is required. It enables a mobile robot to estimate a coarse position of itself in 3D maps with only a monocular camera. The key idea of our work is to use depth information as intermediate data to retrieve a camera image in immense point clouds. We employ an unsupervised learning framework to predict the depth from a single image. Then we use a pretrained convolutional neural network model to generate depth image descriptors to construct representations of the places. We retrieve the position by computing similarity scores between the current depth image and the depth images projected from the 3D maps. Experiments on the publicly available KITTI data sets have demonstrated the efficiency and feasibility of the presented algorithm.

[1]  Paul Newman,et al.  LAPS - localisation using appearance of prior structure: 6-DoF monocular camera localisation using prior pointclouds , 2012, 2012 IEEE International Conference on Robotics and Automation.

[2]  Jizhong Xiao,et al.  6-DoF pose localization in 3D point-cloud dense maps using a monocular camera , 2013, 2013 IEEE International Conference on Robotics and Biomimetics (ROBIO).

[3]  Noah Snavely,et al.  Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Hayaru Shouno,et al.  Analysis of function of rectified linear unit used in deep learning , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[5]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[6]  Tomás Pajdla,et al.  NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Wolfram Burgard,et al.  CMRNet: Camera to LiDAR-Map Registration , 2019, 2019 IEEE Intelligent Transportation Systems Conference (ITSC).

[8]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[9]  Chong-Wah Ngo,et al.  Evaluating bag-of-visual-words representations in scene classification , 2007, MIR '07.

[10]  Wolfram Burgard,et al.  Deep regression for monocular camera-based 6-DoF global localization in outdoor environments , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[11]  Ryan M. Eustice,et al.  Visual localization within LIDAR maps for automated urban driving , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Wolfram Burgard,et al.  Monocular camera localization in 3D LiDAR maps , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[14]  Thomas Brox,et al.  A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Huimin Lu,et al.  Distributed and collaborative monocular simultaneous localization and mapping for multi-robot systems in large-scale environments , 2018 .

[16]  Dorian Gálvez-López,et al.  Bags of Binary Words for Fast Place Recognition in Image Sequences , 2012, IEEE Transactions on Robotics.

[17]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[18]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[19]  Marcelo H. Ang,et al.  2D3D-Matchnet: Learning To Match Keypoints Across 2D Image And 3D Point Cloud , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[20]  David Stutz,et al.  Neural Codes for Image Retrieval , 2015 .

[21]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[22]  Shengyong Chen,et al.  Intelligent Collaborative Localization Among Air-Ground Robots for Industrial Environment Perception , 2019, IEEE Transactions on Industrial Electronics.

[23]  Michael Milford,et al.  Distance metric learning for feature-agnostic place recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[24]  Michael J. Black,et al.  Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Shaowu Yang,et al.  Scale‐aware camera localization in 3D LiDAR maps with a monocular visual odometry , 2019, Comput. Animat. Virtual Worlds.

[26]  Andreas Zell,et al.  Multi-camera visual SLAM for autonomous navigation of micro aerial vehicles , 2017, Robotics Auton. Syst..

[27]  Sebastian Thrun,et al.  Robust vehicle localization in urban environments using probabilistic maps , 2010, 2010 IEEE International Conference on Robotics and Automation.

[28]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[30]  B. Kowalski,et al.  Partial least-squares regression: a tutorial , 1986 .

[31]  Clemens Arth,et al.  Towards SLAM-Based Outdoor Localization using Poor GPS and 2.5D Building Models , 2019, 2019 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[32]  David Filliat,et al.  A visual bag of words method for interactive qualitative localization and mapping , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.