Unsupervised Domain Adaptation for 6DOF Indoor Localization

Visual Localization is gathering more and more attention in computer vision due to the spread of wearable cameras (e.g. smart glasses) and to the increase of general interest in autonomous vehicles and robots. Unfortunately, current localization algorithms rely on large amounts of labeled training data collected in the specific target environment in which the system needs to work. Data collection and labeling in this context is difficult and time-consuming. Moreover, the process has to be repeated when the system is adapted to a new environment. In this work, we consider a scenario in which the target environment has been scanned to obtain a 3D model of the scene suitable to generate large quantities of synthetic data automatically paired with localization labels. We hence investigate the use of Unsupervised Domain Adaptation techniques exploiting labeled synthetic data and unlabeled real data to train localization algorithms. To carry out the study, we introduce a new dataset composed of synthetic and real images labeled with their 6-DOF poses collected in four different indoor rooms which is available at https://iplab.dmi.unict.it/EGO-CH-LOC-UDA. A new method based on self-supervision and attention modules is hence proposed and tested on the proposed dataset. Results show that our method improves over baselines and state-of-the-art algorithms tackling similar domain adaptation

[1]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[2]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[3]  Giovanni Maria Farinella,et al.  SceneAdapt: Scene-based domain adaptation for semantic segmentation using adversarial learning , 2020, Pattern Recognit. Lett..

[4]  Jitendra Malik,et al.  Habitat: A Platform for Embodied AI Research , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Pascal Fua,et al.  Beyond Sharing Weights for Deep Domain Adaptation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Wolfram Burgard,et al.  VLocNet++: Deep Multitask Learning for Semantic Visual Localization and Odometry , 2018, IEEE Robotics and Automation Letters.

[7]  Masatoshi Okutomi,et al.  24/7 Place Recognition by View Synthesis , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Rahul Sukthankar,et al.  Cognitive Mapping and Planning for Visual Navigation , 2017, International Journal of Computer Vision.

[9]  Noah Snavely,et al.  Graph-Based Discriminative Learning for Location Recognition , 2013, International Journal of Computer Vision.

[10]  Giovanni Maria Farinella,et al.  Egocentric visitor localization and artwork detection in cultural sites using synthetic data , 2020, Pattern Recognit. Lett..

[11]  C. V. Jawahar,et al.  Improved Visual Relocalization by Discovering Anchor Points , 2018, BMVC.

[12]  Jayakorn Vongkulbhisal,et al.  Beacon-Guided Structure from Motion for Smartphone-Based Navigation , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[13]  Jun Fu,et al.  Dual Attention Network for Scene Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Jan-Michael Frahm,et al.  Pixelwise View Selection for Unstructured Multi-View Stereo , 2016, ECCV.

[15]  Shuda Li,et al.  RelocNet: Continuous Metric Learning Relocalisation Using Neural Nets , 2018, ECCV.

[16]  Synthetic to Real Unsupervised Domain Adaptation for Single-Stage Artwork Recognition in Cultural Sites , 2020, ArXiv.

[17]  Yuan Shi,et al.  Geodesic flow kernel for unsupervised domain adaptation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Trevor Darrell,et al.  Adapting Visual Category Models to New Domains , 2010, ECCV.

[19]  Giovanni Maria Farinella,et al.  Organizing egocentric videos of daily living activities , 2017, Pattern Recognit..

[20]  Mengjie Zhang,et al.  Deep Reconstruction-Classification Networks for Unsupervised Domain Adaptation , 2016, ECCV.

[21]  Michael I. Jordan,et al.  Deep Transfer Learning with Joint Adaptation Networks , 2016, ICML.

[22]  Esa Rahtu,et al.  Image-Based Localization Using Hourglass Networks , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[23]  Eric Brachmann,et al.  Learning Less is More - 6D Camera Localization via 3D Surface Regression , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Torsten Sattler,et al.  InLoc: Indoor Visual Localization with Dense Matching and View Synthesis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Jan-Michael Frahm,et al.  A Vote-and-Verify Strategy for Fast Spatial Verification in Image Retrieval , 2016, ACCV.

[26]  Trevor Darrell,et al.  Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[28]  Torsten Sattler,et al.  3D visual perception for self-driving cars using a multi-camera system: Calibration, mapping, localization, and obstacle detection , 2017, Image Vis. Comput..

[29]  Torsten Sattler,et al.  Large-Scale Location Recognition and the Geometric Burstiness Problem , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Alex Pentland,et al.  Visual contextual awareness in wearable computing , 1998, Digest of Papers. Second International Symposium on Wearable Computers (Cat. No.98EX215).

[31]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[32]  Taesung Park,et al.  CyCADA: Cycle-Consistent Adversarial Domain Adaptation , 2017, ICML.

[33]  Graham W. Taylor,et al.  Deconvolutional networks , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[34]  Esa Rahtu,et al.  Relative Camera Pose Estimation Using Convolutional Neural Networks , 2017, ACIVS.

[35]  Giovanni Maria Farinella,et al.  Semantic Object Segmentation in Cultural Sites using Real and Synthetic Data , 2021, 2020 25th International Conference on Pattern Recognition (ICPR).

[36]  Harshad Rai,et al.  Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks , 2018 .

[37]  Antonio M. López,et al.  The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Giovanni Maria Farinella,et al.  EGO-CH: Dataset and Fundamental Tasks for Visitors BehavioralUnderstanding using Egocentric Vision , 2020, Pattern Recognit. Lett..

[39]  Giovanni Maria Farinella,et al.  Recognizing Personal Locations From Egocentric Videos , 2017, IEEE Transactions on Human-Machine Systems.