Large-scale Landmark Retrieval/Recognition under a Noisy and Diverse Dataset

The Google-Landmarks-v2 dataset is the biggest worldwide landmarks dataset characterized by a large magnitude of noisiness and diversity. We present a novel landmark retrieval/recognition system, robust to a noisy and diverse dataset, by our team, smlyaka. Our approach is based on deep convolutional neural networks with metric learning, trained by cosine-softmax based losses. Deep metric learning methods are usually sensitive to noise, and it could hinder to learn a reliable metric. To address this issue, we develop an automated data cleaning system. Besides, we devise a discriminative re-ranking method to address the diversity of the dataset for landmark retrieval. Using our methods, we achieved 1st place in the Google Landmark Retrieval 2019 challenge and 3rd place in the Google Landmark Recognition 2019 challenge on Kaggle.

[1]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[2]  Zhenguo Yang,et al.  Regional Maximum Activations of Convolutions with Attention for Cross-domain Beauty and Personal Care Product Retrieval , 2018, ACM Multimedia.

[3]  Albert Gordo,et al.  End-to-End Learning of Deep Visual Representations for Image Retrieval , 2016, International Journal of Computer Vision.

[4]  Shin'ichi Satoh,et al.  Efficient Image Retrieval via Decoupling Diffusion into Online and Offline Processing , 2018, AAAI.

[5]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Shuai Yi,et al.  FishNet: A Versatile Backbone for Image, Region, and Pixel Level Prediction , 2019, NeurIPS.

[7]  Gang Sun,et al.  Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Maksims Volkovs,et al.  Explore-Exploit Graph Traversal for Image Retrieval , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Stefanos Zafeiriou,et al.  ArcFace: Additive Angular Margin Loss for Deep Face Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Jeff Johnson,et al.  Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.

[13]  Yang Gao,et al.  Compact Bilinear Pooling , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Bohyung Han,et al.  Large-Scale Image Retrieval with Attentive Deep Local Features , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[15]  Giorgos Tolias,et al.  Fine-Tuning CNN Image Retrieval with No Human Annotation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Jean-Michel Renders,et al.  A family of contextual measures of similarity between distributions with application to image retrieval , 2009, CVPR.

[17]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Yannis Avrithis,et al.  Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Frank Hutter,et al.  SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[20]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[21]  Xing Ji,et al.  CosFace: Large Margin Cosine Loss for Deep Face Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.