Domain Adaptation of Networks for Camera Pose Estimation: Learning Camera Pose Estimation Without Pose Labels

One of the key criticisms of deep learning is that large amounts of expensive and difficult-to-acquire training data are required in order to train models with high performance and good generalization capabilities. Focusing on the task of monocular camera pose estimation via scene coordinate regression (SCR), we describe a novel method, Domain Adaptation of Networks for Camera pose Estimation (DANCE), which enables the training of models without access to any labels on the target task. DANCE requires unlabeled images (without known poses, ordering, or scene coordinate labels) and a 3D representation of the space (e.g., a scanned point cloud), both of which can be captured with minimal effort using off-the-shelf commodity hardware. DANCE renders labeled synthetic images from the 3D model, and bridges the inevitable domain gap between synthetic and real images by applying unsupervised image-level domain adaptation techniques (unpaired image-to-image translation). When tested on real images, the SCR model trained with DANCE achieved comparable performance to its fully supervised counterpart (in both cases using PnP-RANSAC for final pose estimation) at a fraction of the cost. Our code and dataset are available at https://github.com/JackLangerman/dance

[1]  Tobias Höllerer,et al.  Simulation based Camera Localization under a Variable Lighting Environment , 2016, ICAT-EGVE.

[2]  Karthik Dantu,et al.  Augmenting visual SLAM with Wi-Fi sensing for indoor applications , 2019, Autonomous Robots.

[3]  Marc Pollefeys,et al.  IMAGE-TO-IMAGE TRANSLATION FOR ENHANCED FEATURE MATCHING, IMAGE RETRIEVAL AND VISUAL LOCALIZATION , 2019, ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences.

[4]  Masatoshi Okutomi,et al.  24/7 Place Recognition by View Synthesis , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Andrew W. Fitzgibbon,et al.  Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Andrew W. Fitzgibbon,et al.  Exploiting uncertainty in regression forests for accurate camera relocalization , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Shuda Li,et al.  RelocNet: Continuous Metric Learning Relocalisation Using Neural Nets , 2018, ECCV.

[9]  Xiangyang Ji,et al.  Robust RGB-based 6-DoF Pose Estimation without Real Pose Annotations , 2020, ArXiv.

[10]  Paul Newman,et al.  Appearance-only SLAM at large scale with FAB-MAP 2.0 , 2011, Int. J. Robotics Res..

[11]  Torsten Sattler,et al.  InLoc: Indoor Visual Localization with Dense Matching and View Synthesis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Kostas E. Bekris,et al.  se(3)-TrackNet: Data-driven 6D Pose Tracking by Calibrating Image Residuals in Synthetic Domains , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[13]  Roland Siegwart,et al.  From Coarse to Fine: Robust Hierarchical Localization at Large Scale , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Trevor Darrell,et al.  Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[16]  Harshad Rai,et al.  Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks , 2018 .

[17]  Alexei A. Efros,et al.  Contrastive Learning for Unpaired Image-to-Image Translation , 2020, ECCV.

[18]  Eric Brachmann,et al.  Expert Sample Consensus Applied to Camera Re-Localization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Torsten Sattler,et al.  Camera Pose Voting for Large-Scale Image-Based Localization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[20]  Torsten Sattler,et al.  Efficient & Effective Prioritized Matching for Large-Scale Image-Based Localization , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Luigi di Stefano,et al.  Real-Time RGB-D Camera Pose Estimation in Novel Scenes Using a Relocalisation Cascade , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[23]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[24]  Dorian Gálvez-López,et al.  Bags of Binary Words for Fast Place Recognition in Image Sequences , 2012, IEEE Transactions on Robotics.

[25]  Andreas Möller,et al.  Fast relocalization for visual odometry using binary features , 2013, 2013 IEEE International Conference on Image Processing.

[26]  Marcin Andrychowicz,et al.  Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.

[27]  Daniel Cremers,et al.  Image-Based Localization Using LSTMs for Structured Feature Correlation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[28]  Hannes Sommer,et al.  SegMap: Segment-based mapping and localization using data-driven descriptors , 2019, Int. J. Robotics Res..

[29]  Taesung Park,et al.  CyCADA: Cycle-Consistent Adversarial Domain Adaptation , 2017, ICML.

[30]  Eric Brachmann,et al.  Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Luigi di Stefano,et al.  On-the-Fly Adaptation of Regression Forests for Online Camera Relocalisation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Nuno Vasconcelos,et al.  Bidirectional Learning for Domain Adaptation of Semantic Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Cyrill Stachniss,et al.  Effective Visual Place Recognition Using Multi-Sequence Maps , 2019, IEEE Robotics and Automation Letters.

[34]  Eric Brachmann,et al.  Visual Camera Re-Localization from RGB and RGB-D Images Using DSAC , 2020, ArXiv.

[35]  Rafael Muñoz-Salinas,et al.  UcoSLAM: Simultaneous Localization and Mapping by Fusion of KeyPoints and Squared Planar Markers , 2019, Pattern Recognit..

[36]  Michael Bosse,et al.  Large-scale, real-time visual–inertial localization revisited , 2019, Int. J. Robotics Res..

[37]  Gabriela Csurka,et al.  Robust Image Retrieval-based Visual Localization using Kapture , 2020, ArXiv.

[38]  Torsten Sattler,et al.  Understanding the Limitations of CNN-Based Absolute Camera Pose Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Wolfram Burgard,et al.  VLocNet++: Deep Multitask Learning for Semantic Visual Localization and Odometry , 2018, IEEE Robotics and Automation Letters.

[40]  Vincent Lepetit,et al.  Back to the Feature: Learning Robust Camera Localization from Pixels to Pose , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Haruo Takemura,et al.  REST: Real-to-Synthetic Transform for Illumination Invariant Camera Localization , 2018, ArXiv.

[42]  Michitaka Hirose,et al.  Deep Radio-Visual Localization , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[43]  Matthias Nießner,et al.  ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).