VNLSTM-PoseNet: A novel deep ConvNet for real-time 6-DOF camera relocalization in urban streets

ABSTRACT Image-based relocalization is a renewed interest in outdoor environments, because it is an important problem with many applications. PoseNet introduces Convolutional Neural Network (CNN) for the first time to realize the real-time camera pose solution based on a single image. In order to solve the problem of precision and robustness of PoseNet and its improved algorithms in complex environment, this paper proposes and implements a new visual relocation method based on deep convolutional neural networks (VNLSTM-PoseNet). Firstly, this method directly resizes the input image without cropping to increase the receptive field of the training image. Then, the image and the corresponding pose labels are put into the improved Long Short-Term Memory based (LSTM-based) PoseNet network for training and the network is optimized by the Nadam optimizer. Finally, the trained network is used for image localization to obtain the camera pose. Experimental results on outdoor public datasets show our VNLSTM-PoseNet can lead to drastic improvements in relocalization performance compared to existing state-of-the-art CNN-based methods.

[1]  Grzegorz Kłosowski Effect of features extraction on improving LSTM network quality in ECG signal classification , 2020 .

[2]  Miroslaw Bober,et al.  REMAP: Multi-Layer Entropy-Guided Pooling of Dense CNN Features for Image Retrieval , 2019, IEEE Transactions on Image Processing.

[3]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[4]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Richard I. Hartley,et al.  Optimised KD-trees for fast image descriptor matching , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Ranjan Kumar Behera,et al.  Software Reliability Assessment Using Deep Learning Technique , 2018 .

[7]  Nikos G. Tsagarakis,et al.  Real-Time 6DOF Pose Relocalization for Event Cameras With Stacked Spatial LSTM Networks , 2017, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[8]  Ben Glocker,et al.  Real-Time RGB-D Camera Relocalization via Randomized Ferns for Keyframe Encoding , 2015, IEEE Transactions on Visualization and Computer Graphics.

[9]  Torsten Sattler,et al.  Efficient & Effective Prioritized Matching for Large-Scale Image-Based Localization , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[11]  Kai Zhang,et al.  Deep learning for image-based cancer detection and diagnosis - A survey , 2018, Pattern Recognit..

[12]  Eric Brachmann,et al.  DSAC — Differentiable RANSAC for Camera Localization , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[14]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[15]  Xiaonan Luo,et al.  Resource-efficient and Automated Image-based Indoor Localization , 2019, ACM Trans. Sens. Networks.

[16]  Esa Rahtu,et al.  Image-Based Localization Using Hourglass Networks , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[17]  Joel A. Hesch,et al.  A Direct Least-Squares (DLS) method for PnP , 2011, 2011 International Conference on Computer Vision.

[18]  Wei Li,et al.  Diverse Region-Based CNN for Hyperspectral Image Classification , 2018, IEEE Transactions on Image Processing.

[19]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[20]  Timothy Dozat,et al.  Incorporating Nesterov Momentum into Adam , 2016 .

[21]  S. Winter,et al.  BIM-Tracker: A model-based visual tracking approach for indoor localisation using a 3D building model , 2019, ISPRS Journal of Photogrammetry and Remote Sensing.

[22]  Stephan Winter,et al.  BIM-PoseNet: Indoor camera localisation using a 3D indoor model and deep learning from synthetic images , 2019, ISPRS Journal of Photogrammetry and Remote Sensing.

[23]  Xiaolin Hu,et al.  Delving deeper into convolutional neural networks for camera relocalization , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[24]  Ian D. Reid,et al.  A Hybrid Probabilistic Model for Camera Relocalization , 2018, BMVC.

[25]  Adriaan van Niekerk,et al.  Crop type mapping using LiDAR, Sentinel-2 and aerial imagery with machine learning algorithms , 2020, Geo spatial Inf. Sci..

[26]  James J. Little,et al.  Exploiting Random RGB and Sparse Features for Camera Pose Estimation , 2016, BMVC.

[27]  Alexander J. Smola,et al.  Parallelized Stochastic Gradient Descent , 2010, NIPS.

[28]  Rajeev Srivastava,et al.  Combining CNN streams of dynamic image and depth data for action recognition , 2020, Multimedia Systems.

[29]  Eric Brachmann,et al.  Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Oliver Lock,et al.  Social media as passive geo-participation in transportation planning – how effective are topic modeling & sentiment analysis in comparison with citizen surveys? , 2020, Geo spatial Inf. Sci..

[31]  Juan D. Tardós,et al.  ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras , 2016, IEEE Transactions on Robotics.

[32]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[33]  Luigi di Stefano,et al.  On-the-Fly Adaptation of Regression Forests for Online Camera Relocalisation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Andrew W. Fitzgibbon,et al.  Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Peilin Liu,et al.  Adaptive Stereo Direct Visual Odometry with Real-Time Loop Closure Detection and Relocalization , 2021, 2021 IEEE International Symposium on Circuits and Systems (ISCAS).

[36]  Torsten Sattler,et al.  Fast image-based localization using direct 2D-to-3D matching , 2011, 2011 International Conference on Computer Vision.

[37]  Tinne Tuytelaars,et al.  How to Improve CNN-Based 6-DoF Camera Pose Estimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[38]  Daniel Cremers,et al.  Image-Based Localization Using LSTMs for Structured Feature Correlation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[39]  Roberto Cipolla,et al.  Geometric Loss Functions for Camera Pose Regression with Deep Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[41]  Mojtaba Noghabaei,et al.  Real-Time Image Localization and Registration with BIM Using Perspective Alignment for Indoor Monitoring of Construction , 2019, J. Comput. Civ. Eng..

[42]  Roberto Cipolla,et al.  Modelling uncertainty in deep learning for camera relocalization , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[43]  Federico Tombari,et al.  CNN-SLAM: Real-Time Dense Monocular SLAM with Learned Depth Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Andrew W. Fitzgibbon,et al.  Exploiting uncertainty in regression forests for accurate camera relocalization , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Ilias Bilionis,et al.  Automated Indoor Image Localization to Support a Post-Event Building Assessment , 2020, Sensors.

[46]  Christos Grecos,et al.  Hashing Nets for Hashing: A Quantized Deep Learning to Hash Framework for Remote Sensing Image Retrieval , 2020, IEEE Transactions on Geoscience and Remote Sensing.

[47]  Mohammed Bennamoun,et al.  Image-Based 3D Object Reconstruction: State-of-the-Art and Trends in the Deep Learning Era , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Stephan Winter,et al.  MODELLING UNCERTAINTY OF SINGLE IMAGE INDOOR LOCALISATION USING A 3D MODEL AND DEEP LEARNING , 2019, ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences.

[49]  Andrew W. Fitzgibbon,et al.  Multi-output Learning for Camera Relocalization , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  V. Lepetit,et al.  EPnP: An Accurate O(n) Solution to the PnP Problem , 2009, International Journal of Computer Vision.

[51]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[52]  Ming Li,et al.  A Precise Indoor Visual Positioning Approach Using a Built Image Feature Database and Single User Image from Smartphone Cameras , 2020, Remote. Sens..

[53]  Ming Li,et al.  Accumulative Errors Optimization for Visual Odometry of ORB-SLAM2 Based on RGB-D Cameras , 2019, ISPRS Int. J. Geo Inf..

[54]  Jiri Matas,et al.  Matching with PROSAC - progressive sample consensus , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[55]  Ranjan Kumar Behera,et al.  Genetic algorithm-based community detection in large-scale social networks , 2019, Neural Computing and Applications.

[56]  W. Burgard,et al.  Incorporating Semantic and Geometric Priors in Deep Pose Regression , 2018 .

[57]  Dit-Yan Yeung,et al.  Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[58]  Lei Yu,et al.  A 6-DOFs event-based camera relocalization system by CNN-LSTM and image denoising , 2021, Expert Syst. Appl..