DeepDSAIR: Deep 6-DOF camera relocalization using deblurred semantic-aware image representation for large-scale outdoor environments

Abstract Deep Learning methods can deploy a fast, robust and lightweight model to solve the problem of 6-DOF camera relocalization in large-scale outdoor environments. However, two significant characteristics of captured images in a large-scale outdoor environment are moving objects, which should not include in the representation of an environment, and also motion blur which widely exists in the images captured with moving cameras. None of the existing approaches study and investigate these two problems in their method. This paper, for the first time, proposes a deep network architecture that is trained based on the knowledge achieved by combining deblurring and semantic segmentation modules and examines the effect of this combination on a challenging dataset. Results show approximately 20 and 50% improvement in camera position and orientation re-localization error respectively.

[1]  Tomás Pajdla,et al.  NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Juan D. Tardós,et al.  Fast relocalisation and loop closing in keyframe-based SLAM , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[4]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Tomás Pajdla,et al.  Learning and Calibrating Per-Location Classifiers for Visual Place Recognition , 2013, CVPR.

[6]  Keyu Wu,et al.  From Local Understanding to Global Regression in Monocular Visual Odometry , 2020, Int. J. Pattern Recognit. Artif. Intell..

[7]  Torsten Sattler,et al.  Are Large-Scale 3D Models Really Necessary for Accurate Visual Localization? , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Han Wang,et al.  A New Approach to Train Convolutional Neural Networks for Real-Time 6-DOF Camera Relocalization , 2018, 2018 IEEE 14th International Conference on Control and Automation (ICCA).

[10]  Torsten Sattler,et al.  Efficient & Effective Prioritized Matching for Large-Scale Image-Based Localization , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Paul Newman,et al.  FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance , 2008, Int. J. Robotics Res..

[12]  Wei Zhang,et al.  Image Based Localization in Urban Environments , 2006, Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT'06).

[13]  Roberto Cipolla,et al.  Modelling uncertainty in deep learning for camera relocalization , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Zhichao Yin,et al.  GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[16]  Andrew W. Fitzgibbon,et al.  Multi-output Learning for Camera Relocalization , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Dongbing Gu,et al.  Indoor Relocalization in Challenging Environments With Dual-Stream Convolutional Neural Networks , 2018, IEEE Transactions on Automation Science and Engineering.

[18]  Michal Irani,et al.  Non-uniform Blind Deblurring by Reblurring , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[19]  Daniel P. Huttenlocher,et al.  Location Recognition Using Prioritized Feature Matching , 2010, ECCV.

[20]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[21]  Sen Wang,et al.  DeepVO: Towards end-to-end visual odometry with deep Recurrent Convolutional Neural Networks , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[22]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[23]  Michael Bosse,et al.  Get Out of My Lab: Large-scale, Real-Time Visual-Inertial Localization , 2015, Robotics: Science and Systems.

[24]  Masatoshi Okutomi,et al.  Visual Place Recognition with Repetitive Structures , 2013, CVPR.

[25]  Hamid Reza Pourreza,et al.  Kinect Depth Recovery Based on Local Filters and Plane Primitives , 2017 .

[26]  Torsten Sattler,et al.  Hyperpoints and Fine Vocabularies for Large-Scale Location Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[27]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[28]  Tae Hyun Kim,et al.  Deep Multi-scale Convolutional Neural Network for Dynamic Scene Deblurring , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[30]  Noah Snavely,et al.  Minimal Scene Descriptions from Structure from Motion Models , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Torsten Sattler,et al.  Large-Scale Location Recognition and the Geometric Burstiness Problem , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Pascal Fua,et al.  Worldwide Pose Estimation Using 3D Point Clouds , 2012, ECCV.

[33]  Raanan Fattal,et al.  Blur-Kernel Estimation from Spectral Irregularities , 2012, ECCV.

[34]  Andrew Zisserman,et al.  DisLocation: Scalable Descriptor Distinctiveness for Location Recognition , 2014, ACCV.

[35]  Mubarak Shah,et al.  Image Geo-Localization Based on MultipleNearest Neighbor Feature Matching UsingGeneralized Graphs , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Dorian Gálvez-López,et al.  Bags of Binary Words for Fast Place Recognition in Image Sequences , 2012, IEEE Transactions on Robotics.

[37]  Torsten Sattler,et al.  Fast image-based localization using direct 2D-to-3D matching , 2011, 2011 International Conference on Computer Vision.

[38]  Sen Wang,et al.  VidLoc: A Deep Spatio-Temporal Model for 6-DoF Video-Clip Relocalization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Niko Sünderhauf,et al.  On the performance of ConvNet features for place recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[40]  Noah Snavely,et al.  Graph-Based Discriminative Learning for Location Recognition , 2013, CVPR.

[41]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[42]  Shaojie Shen,et al.  Monocular Visual-Inertial State Estimation for Mobile Augmented Reality , 2017, 2017 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[43]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[44]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[45]  Michael F. Cohen,et al.  Real-time image-based 6-DOF localization in large-scale environments , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Yi Wang,et al.  Scale-Recurrent Network for Deep Image Deblurring , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[47]  Fredrik Kahl,et al.  City-Scale Localization for Cameras with Known Vertical Direction , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Jean-Arcady Meyer,et al.  Fast and Incremental Method for Loop-Closure Detection Using Bags of Visual Words , 2008, IEEE Transactions on Robotics.

[49]  Ian D. Reid,et al.  Article in Press Robotics and Autonomous Systems ( ) – Robotics and Autonomous Systems a Comparison of Loop Closing Techniques in Monocular Slam , 2022 .

[50]  Ahad Harati,et al.  Parallel RANSAC: Speeding up plane extraction in RGBD image sequences using GPU , 2015, 2015 5th International Conference on Computer and Knowledge Engineering (ICCKE).

[51]  Damir Filko,et al.  Place recognition based on matching of planar surfaces and line segments , 2015, Int. J. Robotics Res..

[52]  Michael Milford,et al.  Place Recognition with ConvNet Landmarks: Viewpoint-Robust, Condition-Robust, Training-Free , 2015, Robotics: Science and Systems.

[53]  Michael Milford,et al.  Deep learning features at scale for visual place recognition , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[54]  Zuzana Kukelova,et al.  Real-Time Solution to the Absolute Pose Problem with Unknown Radial Distortion and Focal Length , 2013, 2013 IEEE International Conference on Computer Vision.

[55]  Andrew W. Fitzgibbon,et al.  Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[56]  Tom Drummond,et al.  Fusing points and lines for high performance tracking , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[57]  Ilya Kostrikov,et al.  PlaNet - Photo Geolocation with Convolutional Neural Networks , 2016, ECCV.

[58]  Dieter Schmalstieg,et al.  Discriminative Feature-to-Point Matching in Image-Based Localization , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[59]  Daniel Cremers,et al.  Image-Based Localization Using LSTMs for Structured Feature Correlation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[60]  Mubarak Shah,et al.  Accurate Image Localization Based on Google Maps Street View , 2010, ECCV.

[61]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[62]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.