Multilevel Feedback Joint Representation Learning Network Based on Adaptive Area Elimination for Cross-View Geo-Localization

Cross-view geo-localization refers to the task of matching the same geographic target using images obtained from different platforms, such as drone-view and satellite-view. However, the view angle of images obtained through different platforms will vary greatly, which can bring great challenges to the cross-view geo-localization task. Therefore, we propose a multilevel feedback joint representation learning network based on adaptive area elimination to solve the cross-view geo-localization problem. In our network model, we first process the extracted global features to obtain part-level and patch-level features. We then utilize these features as feedback to the global features to extract the contextual information in the global features and improve the robustness of the extracted features. In addition, as images obtained from different platforms differ, there will always be some interference when matching images. Therefore, we introduce an adaptive area elimination strategy to erase the interference information in the global features and assist the model in obtaining crucial information. On this basis, the feature correlation loss function is designed to constrain learning when using global feature information, thereby eliminating possible interference, which can improve the network model performance. Finally, a series of experiments is carried out using two well-known benchmarks, namely, University-1652 and SUES-200, and the experimental results show that the proposed network model achieves competitive results, thereby demonstrating the effectiveness of the proposed model.

[1]  Shanshan Wan,et al.  MCCG: A ConvNeXt-Based Multiple-Classifier Method for Cross-View Geo-Localization , 2024, IEEE Transactions on Circuits and Systems for Video Technology.

[2]  Zhedong Zheng,et al.  Learning Cross-view Geo-localization Embeddings via Dynamic Weighted Decorrelation Regularization , 2022, ArXiv.

[3]  Chunhua Shen,et al.  SegViT: Semantic Segmentation with Plain Vision Transformers , 2022, NeurIPS.

[4]  Heng Tao Shen,et al.  UAV-Satellite View Synthesis for Cross-View Geo-Localization , 2022, IEEE Transactions on Circuits and Systems for Video Technology.

[5]  G. Csurka,et al.  Investigating the Role of Image Retrieval for Visual Localization , 2022, International Journal of Computer Vision.

[6]  N. Sebe,et al.  Joint Representation Learning and Keypoint Detection for Cross-View Geo-Localization , 2022, IEEE Transactions on Image Processing.

[7]  F. Yang,et al.  Geo-Localization via Ground-to-Satellite Cross-View Image Retrieval , 2022, IEEE Transactions on Multimedia.

[8]  Runzhe Zhu,et al.  SUES-200: A Multi-Height Multi-Scene Cross-View Image Benchmark Across Drone and Satellite , 2022, IEEE Transactions on Circuits and Systems for Video Technology.

[9]  Suha Kwak,et al.  ReSTR: Convolution-free Referring Image Segmentation Using Transformers , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Jiedong Zhuang,et al.  A Transformer-Based Feature Segmentation and Region Alignment Method for UAV-View Geo-Localization , 2022, IEEE Transactions on Circuits and Systems for Video Technology.

[11]  L. Gool,et al.  VisDrone-DET2021: The Vision Meets Drone Object detection Challenge Results , 2021, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).

[12]  Yingying Zhu,et al.  Cross-view Geo-localization with Evolving Transformer , 2021, ArXiv.

[13]  Xipeng Qiu,et al.  A Survey of Transformers , 2021, AI Open.

[14]  Shengfeng He,et al.  Projecting Your View Attentively: Monocular Road Scene Layout Estimation via Cross-view Transformation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Qunjie Zhou,et al.  Coming Down to Earth: Satellite-to-Street View Synthesis for Geo-Localization , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Pichao Wang,et al.  TransReID: Transformer-based Object Re-Identification , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Fahad Shahbaz Khan,et al.  Transformers in Vision: A Survey , 2021, ACM Comput. Surv..

[18]  Lingxuan Meng,et al.  A Practical Cross-View Image Matching Method between UAV and Satellite for UAV-Based Geo-Localization , 2020, Remote. Sens..

[19]  D. Tao,et al.  A Survey on Vision Transformer , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Chen Chen,et al.  VIGOR: Cross-View Image Geo-localization beyond One-to-one Retrieval , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[22]  Jiyong Zhang,et al.  Each Part Matters: Local Patterns Facilitate Cross-View Geo-Localization , 2020, IEEE Transactions on Circuits and Systems for Video Technology.

[23]  Xin Yu,et al.  Where Am I Looking At? Joint Location and Orientation Estimation by Cross-View Matching , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Qinghua Hu,et al.  Multi-Drone-Based Single Object Tracking With Agent Sharing Network , 2020, IEEE Transactions on Circuits and Systems for Video Technology.

[25]  Yunchao Wei,et al.  University-1652: A Multi-view Multi-source Benchmark for Drone-based Geo-localization , 2020, ACM Multimedia.

[26]  Hongdong Li,et al.  Optimal Feature Transport for Cross-View Image Geo-Localization , 2019, AAAI.

[27]  Hongdong Li,et al.  Lending Orientation to Neural Networks for Cross-View Geo-Localization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Ming Shao,et al.  Person Re-Identification by Cross-View Multi-Level Dictionary Learning , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Gim Hee Lee,et al.  CVM-Net: Cross-View Matching Network for Image-Based Ground-to-Aerial Geo-Localization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[31]  Qing Liu,et al.  Accurate Object Localization in Remote Sensing Images Based on Convolutional Neural Networks , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[32]  Scott Workman,et al.  Predicting Ground-Level Scene Layout from Aerial Imagery , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Raquel Urtasun,et al.  Understanding the Effective Receptive Field in Deep Convolutional Neural Networks , 2016, NIPS.

[34]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Serge J. Belongie,et al.  Learning deep representations for ground-to-aerial geolocalization , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[37]  Serge J. Belongie,et al.  Cross-View Image Geolocalization , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Hui Cheng,et al.  Geo-localization of street views with aerial image databases , 2011, ACM Multimedia.

[39]  Ahmed M. Elgammal,et al.  A framework for global vehicle localization using stereo images and satellite and road maps , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[40]  Samy Bengio,et al.  Large Scale Online Learning of Image Similarity Through Ranking , 2009, J. Mach. Learn. Res..

[41]  David G. Luenberger Information Science , 2006, Handbook of Fuzzy Computation.

[42]  Bo Sun,et al.  F3-Net: Multiview Scene Matching for Drone-Based Geo-Localization , 2023, IEEE Transactions on Geoscience and Remote Sensing.

[43]  Xutao Li,et al.  Cross-View Object Geo-Localization in a Local Region With Satellite Imagery , 2023, IEEE Transactions on Geoscience and Remote Sensing.

[44]  Yingying Zhu,et al.  It’s Okay to Be Wrong: Cross-View Geo-Localization With Step-Adaptive Iterative Refinement , 2022, IEEE Transactions on Geoscience and Remote Sensing.

[45]  Jinde Cao,et al.  Rethinking Low-Light Enhancement via Transformer-GAN , 2022, IEEE Signal Processing Letters.

[46]  Sen Jia,et al.  Geographic Semantic Network for Cross-View Image Geo-Localization , 2022, IEEE Transactions on Geoscience and Remote Sensing.

[47]  Stephen Lin,et al.  Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[48]  Alan L. Yuille,et al.  Every View Counts: Cross-View Consistency in 3D Object Detection with Hybrid-Cylindrical-Spherical Voxelization , 2020, NeurIPS.

[49]  Xin Yu,et al.  Spatial-Aware Feature Aggregation for Image based Cross-View Geo-Localization , 2019, NeurIPS.