Cross-View Visual Geo-Localization for Outdoor Augmented Reality

Precise estimation of global orientation and location is critical to ensure a compelling outdoor Augmented Reality (AR) experience. We address the problem of geo-pose estimation by cross-view matching of query ground images to a geo-referenced aerial satellite image database. Recently, neural network-based methods have shown state-of-the-art performance in cross-view matching. However, most of the prior works focus only on location estimation, ignoring orientation, which cannot meet the requirements in outdoor AR applications. We propose a new transformer neural network-based model and a modified triplet ranking loss for joint location and orientation estimation. Experiments on several benchmark cross-view geo-localization datasets show that our model achieves state-of-the-art performance. Furthermore, we present an approach to extend the single image query-based geo-localization approach by utilizing temporal information from a navigation pipeline for robust continuous geo-localization. Experimentation on several large-scale real-world video sequences demonstrates that our approach enables high-precision and stable AR insertion.

[1]  Changyin Sun,et al.  Transformer-Guided Convolutional Neural Network for Cross-View Geolocalization , 2022, ArXiv.

[2]  Hongdong Li,et al.  Beyond Cross-view Image Retrieval: Highly Accurate Vehicle Localization Using Satellite Image , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  M. Shah,et al.  TransGeo: Transformer Is All You Need for Cross-view Image Geo-localization , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Piotr Koniusz,et al.  Accurate 3-DoF Camera Geo-Localization via Ground-to-Satellite Image Matching , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Shang-Hong Lai,et al.  Y-Net: Learning Domain Robust Feature Representation for ground camera image and large-scale image-based point cloud registration , 2021, Inf. Sci..

[6]  Qunjie Zhou,et al.  Coming Down to Earth: Satellite-to-Street View Synthesis for Geo-Localization , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Sijie Zhu,et al.  VIGOR: Cross-View Image Geo-localization beyond One-to-one Retrieval , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[9]  Ian Goodfellow,et al.  Generative adversarial networks , 2020, Commun. ACM.

[10]  Torsten Sattler,et al.  Long-Term Visual Localization Revisited , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Supun Samarasekera,et al.  RGB2LIDAR: Towards Solving Large-Scale Cross-Modal Visual Localization , 2020, ACM Multimedia.

[12]  Xiang Li,et al.  External Disturbances Rejection for Vector Field Sensors in Attitude and Heading Reference Systems , 2020, Micromachines.

[13]  Xin Yu,et al.  Where Am I Looking At? Joint Location and Orientation Estimation by Cross-View Matching , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Salman Khan,et al.  Ground-to-Aerial Image Geo-Localization With a Hard Exemplar Reweighting Triplet Loss , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Mubarak Shah,et al.  Bridging the Domain Gap for Ground-to-Aerial Image Matching , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Hongdong Li,et al.  Lending Orientation to Neural Networks for Cross-View Geo-Localization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Pierre Genevès,et al.  A Method to Quantitatively Evaluate Geo Augmented Reality Applications , 2018, 2018 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct).

[18]  Luc Van Gool,et al.  Night-to-Day Image Translation for Retrieval-based Localization , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[19]  Amit K. Roy-Chowdhury,et al.  Learning Joint Embedding with Multimodal Cues for Cross-Modal Video-Text Retrieval , 2018, ICMR.

[20]  Jianhua Zhang,et al.  Instant SLAM Initialization for Outdoor Omnidirectional Augmented Reality , 2018, CASA.

[21]  Frank Hutter,et al.  Fixing Weight Decay Regularization in Adam , 2017, ArXiv.

[22]  Supun Samarasekera,et al.  [POSTER] CamSLAM: Vision Aided Inertial Tracking and Mapping Framework for Large Scale AR Applications , 2017, 2017 IEEE International Symposium on Mixed and Augmented Reality (ISMAR-Adjunct).

[23]  Shaojie Shen,et al.  Monocular Visual-Inertial State Estimation for Mobile Augmented Reality , 2017, 2017 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[24]  Qingyun Du,et al.  A Mobile Outdoor Augmented Reality Method Combining Deep Learning Object Detection and Spatial Relationships for Geovisualization , 2017, Sensors.

[25]  Scott Workman,et al.  Predicting Ground-Level Scene Layout from Aerial Imagery , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Scott Workman,et al.  Wide-Area Image Geolocalization with Aerial Reference Imagery , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[28]  Supun Samarasekera,et al.  Augmented Reality Scout: Joint Unaided-Eye and Telescopic-Zoom System for Immersive Team Training , 2015, 2015 IEEE International Symposium on Mixed and Augmented Reality.

[29]  Serge J. Belongie,et al.  Learning deep representations for ground-to-aerial geolocalization , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Scott Workman,et al.  On the location dependence of convolutional neural network features , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[31]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[32]  Supun Samarasekera,et al.  Multi-sensor navigation algorithm using monocular camera, IMU and GPS for large scale augmented reality , 2012, 2012 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[33]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Stergios I. Roumeliotis,et al.  A Multi-State Constraint Kalman Filter for Vision-aided Inertial Navigation , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[35]  Szymon Rusinkiewicz,et al.  Improved sub-pixel stereo correspondences through symmetric refinement , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[36]  G. LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[37]  Qiang Zhai,et al.  Mutual Generative Transformer Learning for Cross-view Geo-localization , 2022, arXiv.org.

[38]  Ying J. Zhu,et al.  Cross-view Geo-localization with Layer-to-Layer Transformer , 2021, NeurIPS.

[39]  Supun Samarasekera,et al.  Semantically-Aware Attentive Neural Embeddings for 2D Long-Term Visual Localization , 2019, BMVC.