Learning Condition Invariant Features for Retrieval-Based Localization from 1M Images

Image features for retrieval-based localization must be invariant to dynamic objects (e.g. cars) as well as seasonal and daytime changes. Such invariances are, up to some extent, learnable with existing methods using triplet-like losses, given a large number of diverse training images. However, due to the high algorithmic training complexity, there exists insufficient comparison between different loss functions on large datasets. In this paper, we train and evaluate several localization methods on three different benchmark datasets, including Oxford RobotCar with over one million images. This large scale evaluation yields valuable insights into the generalizability and performance of retrieval-based localization. Based on our findings, we develop a novel method for learning more accurate and better generalizing localization features. It consists of two main contributions: (i) a feature volume-based loss function, and (ii) hard positive and pairwise negative mining. On the challenging Oxford RobotCar night condition, our method outperforms the well-known triplet loss by 24.4% in localization accuracy within 5m.

[1]  Gim Hee Lee,et al.  PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Huimin Lu,et al.  Deep adversarial metric learning for cross-modal retrieval , 2019, World Wide Web.

[3]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Tengyuan Liang,et al.  Law of log determinant of sample covariance matrix and optimal estimation of differential entropy for high-dimensional Gaussian distributions , 2013, Journal of Multivariate Analysis.

[5]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[6]  Masatoshi Okutomi,et al.  Visual Place Recognition with Repetitive Structures , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  A. M. Mathai Random p-content of a p-parallelotope in Euclidean n-space , 1999, Advances in Applied Probability.

[8]  Dacheng Tao,et al.  Deep Metric Learning With Tuplet Margin Loss , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[9]  Jian Wang,et al.  Deep Metric Learning with Angular Loss , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10]  Valérie Gouet-Brunet,et al.  A survey on Visual-Based Localization: On the benefit of heterogeneous data , 2018, Pattern Recognit..

[11]  Xudong Lin,et al.  Deep Adversarial Metric Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Barbara Caputo,et al.  COLD: The CoSy Localization Database , 2009, Int. J. Robotics Res..

[13]  Andrew Zisserman,et al.  DisLocation: Scalable Descriptor Distinctiveness for Location Recognition , 2014, ACCV.

[14]  Silvio Savarese,et al.  Deep Metric Learning via Lifted Structured Feature Embedding , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Jan-Michael Frahm,et al.  From structure-from-motion point clouds to fast location recognition , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Michel Dhome,et al.  Real Time Localization and 3D Reconstruction , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[17]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[18]  Luc Van Gool,et al.  Night-to-Day Image Translation for Retrieval-based Localization , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[19]  Qi Tian,et al.  SIFT Meets CNN: A Decade Survey of Instance Retrieval , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[21]  Torsten Sattler,et al.  Camera Pose Voting for Large-Scale Image-Based Localization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[22]  Cyrill Stachniss,et al.  Lazy Data Association For Image Sequences Matching Under Substantial Appearance Changes , 2016, IEEE Robotics and Automation Letters.

[23]  Jiri Matas,et al.  Working hard to know your neighbor's margins: Local descriptor learning loss , 2017, NIPS.

[24]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Jiwen Lu,et al.  Deep Embedding Learning With Discriminative Sampling Policy , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Lei Zhou,et al.  Matchable Image Retrieval by Learning from Surface Reconstruction , 2018, ACCV.

[27]  Ronan Sicre,et al.  Particular object retrieval with integral max-pooling of CNN activations , 2015, ICLR.

[28]  Eugene H. Gover,et al.  Determinants and the volumes of parallelotopes and zonotopes , 2010 .

[29]  David W. Murray,et al.  Video-rate localization in multiple maps for wearable augmented reality , 2008, 2008 12th IEEE International Symposium on Wearable Computers.

[30]  Jorgen Nielsen The distribution of volume reductions induced by isotropic random projections , 1999, Advances in Applied Probability.

[31]  Torsten Sattler,et al.  Are Large-Scale 3D Models Really Necessary for Accurate Visual Localization? , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Olivier Stasse,et al.  MonoSLAM: Real-Time Single Camera SLAM , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Lucas Beyer,et al.  In Defense of the Triplet Loss for Person Re-Identification , 2017, ArXiv.

[34]  Federico Magliani,et al.  An accurate retrieval through R-MAC+ descriptors for landmark recognition , 2018, ICDSC.

[35]  Tom Drummond,et al.  Scalable Monocular SLAM , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[36]  Michael Milford,et al.  Sequence searching with deep-learnt depth for condition- and viewpoint-invariant route-based place recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[37]  Ivan Laptev,et al.  Deep Metric Learning Beyond Binary Supervision , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Lei Zhou,et al.  ContextDesc: Local Descriptor Augmentation With Cross-Modality Context , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Simon Lacroix,et al.  Location graphs for visual place recognition , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[40]  Yang Hua,et al.  Ranked List Loss for Deep Metric Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Paul Newman,et al.  1 year, 1000 km: The Oxford RobotCar dataset , 2017, Int. J. Robotics Res..

[42]  Qi Qian,et al.  SoftTriple Loss: Deep Metric Learning Without Triplet Sampling , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[43]  Martin Humenberger,et al.  R2D2: Reliable and Repeatable Detector and Descriptor , 2019, NeurIPS.

[44]  Weihong Deng,et al.  Adversarial Learning With Margin-Based Triplet Embedding Regularization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[45]  Jan-Michael Frahm,et al.  Learned Contextual Feature Reweighting for Image Geo-Localization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Titus Cieslewski,et al.  Data-Efficient Decentralized Visual SLAM , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[47]  Tomás Pajdla,et al.  NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[49]  Torsten Sattler,et al.  Efficient & Effective Prioritized Matching for Large-Scale Image-Based Localization , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[51]  Luc Van Gool,et al.  Geometrically Mappable Image Features , 2020, IEEE Robotics and Automation Letters.

[52]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[53]  Albert Gordo,et al.  End-to-End Learning of Deep Visual Representations for Image Retrieval , 2016, International Journal of Computer Vision.

[54]  Gordon Wyeth,et al.  SeqSLAM: Visual route-based navigation for sunny summer days and stormy winter nights , 2012, 2012 IEEE International Conference on Robotics and Automation.

[55]  Daniel P. Huttenlocher,et al.  Location Recognition Using Prioritized Feature Matching , 2010, ECCV.

[56]  Luc Van Gool,et al.  Mapping, Localization and Path Planning for Image-Based Navigation Using Visual Features and Map , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Torsten Sattler,et al.  Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[58]  Stéphane Dupont,et al.  Towards Good Practices for Image Retrieval Based on CNN Features , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[59]  P. J. Narayanan,et al.  Visibility Probability Structure from SfM Datasets and Applications , 2012, ECCV.

[60]  Torsten Sattler,et al.  D2-Net: A Trainable CNN for Joint Description and Detection of Local Features , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Bohyung Han,et al.  Large-Scale Image Retrieval with Attentive Deep Local Features , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[62]  Kaiqi Huang,et al.  Beyond Triplet Loss: A Deep Quadruplet Network for Person Re-identification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  Sebastien Glaser,et al.  Simultaneous Localization and Mapping: A Survey of Current Trends in Autonomous Driving , 2017, IEEE Transactions on Intelligent Vehicles.