论文信息 - HyNet: Local Descriptor with Hybrid Similarity Measure and Triplet Loss

HyNet: Local Descriptor with Hybrid Similarity Measure and Triplet Loss

Recent works show that local descriptor learning benefits from the use of L2 normalisation, however, an in-depth analysis of this effect lacks in the literature. In this paper, we investigate how L2 normalisation affects the back-propagated descriptor gradients during training. Based on our observations, we propose HyNet, a new local descriptor that leads to state-of-the-art results in matching. HyNet introduces a hybrid similarity measure for triplet margin loss, a regularisation term constraining the descriptor norm, and a new network architecture that performs L2 normalisation of all intermediate feature maps and the output descriptors. HyNet surpasses previous methods by a significant margin on standard benchmarks that include patch matching, verification, and retrieval, as well as outperforming full end-to-end methods on 3D reconstruction tasks.

[1] Hugo Germain,et al. S2DNet: Learning Accurate Correspondences for Sparse-to-Dense Feature Matching , 2020, ArXiv.

[2] Dacheng Tao,et al. Correcting the Triplet Selection Bias for Triplet Loss , 2018, ECCV.

[3] Xin Yu,et al. SOSNet: Second Order Similarity Regularization for Local Descriptor Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4] James Philbin,et al. FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Szymon Rusinkiewicz,et al. Learning Local Descriptors With a CDF-Based Dynamic Soft Margin , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6] G LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[7] Tomás Pajdla,et al. Neighbourhood Consensus Networks , 2018, NeurIPS.

[8] Gang Hua,et al. Discriminative Learning of Local Image Descriptors , 1990, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9] Andrea Vedaldi,et al. Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.

[10] Bin Fan,et al. Local Intensity Order Pattern for feature description , 2011, 2011 International Conference on Computer Vision.

[11] Xing Ji,et al. CosFace: Large Margin Cosine Loss for Deep Face Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12] Eric Brachmann,et al. Neural-Guided RANSAC: Learning Where to Sample Model Hypotheses , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13] Jiri Matas,et al. Two-view geometry estimation unaffected by a dominant plane , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[14] Andrew Zisserman,et al. Deep Face Recognition , 2015, BMVC.

[15] Andrea Vedaldi,et al. HPatches: A Benchmark and Evaluation of Handcrafted and Learned Local Descriptors , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Torsten Sattler,et al. Comparative Evaluation of Hand-Crafted and Learned Local Features , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Vincent Lepetit,et al. LIFT: Learned Invariant Feature Transform , 2016, ECCV.

[18] Stefanos Zafeiriou,et al. ArcFace: Additive Angular Margin Loss for Deep Face Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[20] Long Quan,et al. ASLFeat: Learning Local Features of Accurate Shape and Localization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Torsten Sattler,et al. D2-Net: A Trainable CNN for Joint Detection and Description of Local Features , 2019, CVPR 2019.

[22] Pascal Fua,et al. Beyond Cartesian Representations for Local Descriptors , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[23] Ser-Nam Lim,et al. A Metric Learning Reality Check , 2020, ECCV.

[24] Krystian Mikolajczyk,et al. Learning local feature descriptors with triplets and shallow convolutional neural networks , 2016, BMVC.

[25] Lei Zhou,et al. GeoDesc: Learning Local Descriptors by Integrating Geometry Constraints , 2018, ECCV.

[26] Xiangyu Zhu,et al. AdaptiveFace: Adaptive Margin and Sampling for Face Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[28] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[29] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[30] Pascal Fua,et al. LF-Net: Learning Local Features from Images , 2018, NeurIPS.

[31] Josef Sivic,et al. Efficient Neighbourhood Consensus Networks via Submanifold Sparse Convolutions , 2020, ECCV.

[32] Tomasz Malisiewicz,et al. SuperPoint: Self-Supervised Interest Point Detection and Description , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[33] Weilin Huang,et al. Deep Metric Learning with Hierarchical Triplet Loss , 2018, ECCV.

[34] Lucas Beyer,et al. In Defense of the Triplet Loss for Person Re-Identification , 2017, ArXiv.

[35] Hujun Bao,et al. GIFT: Learning Transformation-Invariant Dense Visual Descriptors via Group CNNs , 2019, NeurIPS.

[36] Iasonas Kokkinos,et al. Discriminative Learning of Deep Convolutional Feature Point Descriptors , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[37] Bin Fan,et al. L2-Net: Deep Learning of Discriminative Patch Descriptor in Euclidean Space , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38] Pascal Fua,et al. Image Matching Across Wide Baselines: From Paper to Practice , 2020, International Journal of Computer Vision.

[39] Vincent Lepetit,et al. Learning to Find Good Correspondences , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40] Shankar Krishnan,et al. Filter Response Normalization Layer: Eliminating Batch Dependence in the Training of Deep Neural Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41] Tomasz Malisiewicz,et al. SuperGlue: Learning Feature Matching With Graph Neural Networks , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42] Margarita Chli,et al. Learning Deep Descriptors with Scale-Aware Triplet Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43] Krystian Mikolajczyk,et al. Key.Net: Keypoint Detection by Handcrafted and Learned CNN Filters , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[44] Jiri Matas,et al. Working hard to know your neighbor's margins: Local descriptor learning loss , 2017, NIPS.

[45] Yan Lu,et al. Local Descriptors Optimized for Average Precision , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[46] Nanning Zheng,et al. Person Re-identification by Multi-Channel Parts-Based CNN with Improved Triplet Loss Function , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47] Gabriela Csurka,et al. R2D2: Repeatable and Reliable Detector and Descriptor , 2019, ArXiv.