论文信息 - Domain-Invariant Similarity Activation Map Contrastive Learning for Retrieval-Based Long-Term Visual Localization

Domain-Invariant Similarity Activation Map Contrastive Learning for Retrieval-Based Long-Term Visual Localization

Visual localization is a crucial component in the application of mobile robot and autonomous driving. Image retrieval is an efficient and effective technique in image-based localization methods. Due to the drastic variability of environmental conditions, e.g. illumination, seasonal and weather changes, retrieval-based visual localization is severely affected and becomes a challenging problem. In this work, a general architecture is first formulated probabilistically to extract domain-invariant feature through multi-domain image translation. And then a novel gradient-weighted similarity activation mapping loss (Grad-SAM) is incorporated for finer localization with high accuracy. We also propose a new adaptive triplet loss to boost the metric learning of the embedding in a self-supervised manner. The final coarse-to-fine image retrieval pipeline is implemented as the sequential combination of models without and with Grad-SAM loss. Extensive experiments have been conducted to validate the effectiveness of the proposed approach on the CMU-Seasons dataset. The strong generalization ability of our approach is verified on RobotCar dataset using models pre-trained on urban part of CMU-Seasons dataset. Our performance is on par with or even outperforms the state-of-the-art image-based localization baselines in medium or high precision, especially under the challenging environments with illumination variance, vegetation and night-time images. Moreover, real-site experiments have been conducted to validate the efficiency and effectiveness of the coarse-to-fine strategy for localization.

[1] Peter I. Corke,et al. Visual Place Recognition: A Survey , 2016, IEEE Transactions on Robotics.

[2] Alexei A. Efros,et al. Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Yann LeCun,et al. Disentangling factors of variation in deep representation using adversarial training , 2016, NIPS.

[4] Roland Siegwart,et al. From Coarse to Fine: Robust Hierarchical Localization at Large Scale , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Hong Yang,et al. Artificial intelligence applications in the development of autonomous vehicles: a survey , 2020, IEEE/CAA Journal of Automatica Sinica.

[6] Yuqing He,et al. A Multi-Domain Feature Learning Method for Visual Place Recognition , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[7] Hesheng Wang,et al. Retrieval-based Localization Based on Domain-invariant Feature Learning under Changing Environments , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[8] 拓海杉山,et al. “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[9] Gang Wang,et al. Gated Siamese Convolutional Neural Network Architecture for Human Re-identification , 2016, ECCV.

[10] Chris Donahue,et al. Semantically Decomposing the Latent Spaces of Generative Adversarial Networks , 2017, ICLR.

[11] MengChu Zhou,et al. Latent Factor-Based Recommenders Relying on Extended Stochastic Gradient Descent Algorithms , 2019, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[12] Michael Milford,et al. Filter Early, Match Late: Improving Network-Based Visual Place Recognition , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[13] Masatoshi Okutomi,et al. 24/7 Place Recognition by View Synthesis , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14] Navdeep Jaitly,et al. Adversarial Autoencoders , 2015, ArXiv.

[15] Luc Van Gool,et al. DLOW: Domain Flow for Adaptation and Generalization , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Jiwen Lu,et al. Discriminative Deep Metric Learning for Face and Kinship Verification , 2017, IEEE Transactions on Image Processing.

[17] MengChu Zhou,et al. A Deep Latent Factor Model for High-Dimensional and Sparse Matrices in Recommender Systems , 2019, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[18] Paul Newman,et al. FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance , 2008, Int. J. Robotics Res..

[19] Tat-Jun Chin,et al. Scalable Place Recognition Under Appearance Change for Autonomous Driving , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[20] Javier Civera,et al. DynaSLAM: Tracking, Mapping, and Inpainting in Dynamic Scenes , 2018, IEEE Robotics and Automation Letters.

[21] Nir Ailon,et al. Deep Metric Learning Using Triplet Network , 2014, SIMBAD.

[22] Desire Sidibé,et al. Learning Scene Geometry for Visual Localization in Challenging Conditions , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[23] Soumith Chintala,et al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[24] Michael I. Jordan,et al. Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[25] Abhishek Das,et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[26] Torsten Sattler,et al. Understanding the Limitations of CNN-Based Absolute Camera Pose Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Tao Lu,et al. Localizing Discriminative Visual Landmarks for Place Recognition , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[28] Long Chen,et al. Advances in Vision-Based Lane Detection: Algorithms, Integration, Assessment, and Perspectives on ACP-Based Parallel Vision , 2018, IEEE/CAA Journal of Automatica Sinica.

[29] Wolfram Burgard,et al. Semantics-aware visual localization under challenging perceptual conditions , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[30] Maneesh Kumar Singh,et al. DRIT++: Diverse Image-to-Image Translation via Disentangled Representations , 2019, International Journal of Computer Vision.

[31] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[32] Albert Gordo,et al. End-to-End Learning of Deep Visual Representations for Image Retrieval , 2016, International Journal of Computer Vision.

[33] Gordon Wyeth,et al. SeqSLAM: Visual route-based navigation for sunny summer days and stormy winter nights , 2012, 2012 IEEE International Conference on Robotics and Automation.

[34] Yue Wang,et al. Adversarial Feature Disentanglement for Place Recognition Across Changing Appearance , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[35] Yang Song,et al. Learning Fine-Grained Image Similarity with Deep Ranking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[36] Michael Milford,et al. Don't Look Back: Robustifying Place Categorization for Viewpoint- and Condition-Invariant Place Recognition , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[37] Bolei Zhou,et al. Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38] Valérie Gouet-Brunet,et al. Improving Image Description with Auxiliary Modality for Visual Localization in Challenging Conditions , 2020, International Journal of Computer Vision.

[39] Michael Milford,et al. Deep learning features at scale for visual place recognition , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[40] Guoyin Wang,et al. A Posterior-Neighborhood-Regularized Latent Factor Model for Highly Accurate Web Service QoS Prediction , 2022, IEEE Transactions on Services Computing.

[41] Jan Kautz,et al. Unsupervised Image-to-Image Translation Networks , 2017, NIPS.

[42] Peter Xiaoping Liu,et al. A mixed-depth visual rendering method for bleeding simulation , 2019, IEEE/CAA Journal of Automatica Sinica.

[43] Xiaohua Wang,et al. Hierarchical visual attention model for saliency detection inspired by avian visual pathways , 2019, IEEE/CAA Journal of Automatica Sinica.

[44] Juan D. Tardós,et al. ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras , 2016, IEEE Transactions on Robotics.

[45] Silvio Savarese,et al. Deep Metric Learning via Lifted Structured Feature Embedding , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46] Shuda Li,et al. RelocNet: Continuous Metric Learning Relocalisation Using Neural Nets , 2018, ECCV.

[47] Michael Milford,et al. Supervised and Unsupervised Linear Learning Techniques for Visual Place Recognition in Changing Environments , 2016, IEEE Transactions on Robotics.

[48] Paul Newman,et al. 1 year, 1000 km: The Oxford RobotCar dataset , 2017, Int. J. Robotics Res..

[49] Andriy Mnih,et al. Disentangling by Factorising , 2018, ICML.

[50] Paul Newman,et al. Adversarial Training for Adverse Conditions: Robust Metric Localisation Using Appearance Transfer , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[51] Cordelia Schmid,et al. Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[52] Torsten Sattler,et al. Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[53] Hong Zhang,et al. Fast-SeqSLAM: A fast appearance based place recognition algorithm , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[54] Giorgos Tolias,et al. Fine-Tuning CNN Image Retrieval with No Human Annotation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55] Jan-Michael Frahm,et al. Learned Contextual Feature Reweighting for Image Geo-Localization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56] Tomás Pajdla,et al. NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57] Torsten Sattler,et al. Efficient & Effective Prioritized Matching for Large-Scale Image-Based Localization , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58] Ondrej Chum,et al. No Fear of the Dark: Image Retrieval Under Varying Illumination Conditions , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[59] Luc Van Gool,et al. Night-to-Day Image Translation for Retrieval-based Localization , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[60] Hui Wei,et al. Avoiding non-Manhattan obstacles based on projection of spatial corners in indoor environment , 2020, IEEE/CAA Journal of Automatica Sinica.

[61] Vincent Lepetit,et al. Learning descriptors for object recognition and 3D pose estimation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[62] Stefano Soatto,et al. Emergence of Invariance and Disentanglement in Deep Representations , 2017, 2018 Information Theory and Applications Workshop (ITA).

[63] Dorian Gálvez-López,et al. Bags of Binary Words for Fast Place Recognition in Image Sequences , 2012, IEEE Transactions on Robotics.

[64] Kilian Q. Weinberger,et al. Fast solvers and efficient implementations for distance metric learning , 2008, ICML '08.

[65] Lars Hammarstrand,et al. Long-Term Visual Localization Using Semantically Segmented Images , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[66] Lingqiao Liu,et al. Learning Context Flexible Attention Model for Long-Term Visual Place Recognition , 2018, IEEE Robotics and Automation Letters.

[67] Luc Van Gool,et al. ComboGAN: Unrestrained Scalability for Image Domain Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[68] Vineeth N. Balasubramanian,et al. Grad-CAM++: Generalized Gradient-Based Visual Explanations for Deep Convolutional Networks , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[69] Jan Kautz,et al. Multimodal Unsupervised Image-to-Image Translation , 2018, ECCV.

[70] Javier González,et al. Appearance-invariant place recognition by discriminatively training a convolutional neural network , 2017, Pattern Recognit. Lett..

[71] Pieter Abbeel,et al. InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[72] Gustavo Carneiro,et al. Learning Local Image Descriptors with Deep Siamese and Triplet Convolutional Networks by Minimizing Global Loss Functions , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).