SDGMNet: Statistic-based Dynamic Gradient Modulation for Local Descriptor Learning

Modifications on triplet loss that rescale the backpropagated gradients of special pairs have made significant progress on local descriptor learning. However, current gradient modulation strategies are mainly static so that they would suffer from changes of training phases or datasets. In this paper, we propose a dynamic gradient modulation, named SDGMNet, to improve triplet loss for local descriptor learning. The core of our method is formulating modulation functions with statistical characteristics which are estimated dynamically. Firstly, we perform deep analysis on back propagation of general tripletbased loss and introduce included angle for distance measure. On this basis, auto-focus modulation is employed to moderate the impact of statistically uncommon individual pairs in stochastic gradient descent optimization; probabilistic margin cuts off the gradients of proportional Siamese pairs that are believed to reach the optimum; power adjustment balances the total weights of negative pairs and positive pairs. Extensive experiments demonstrate that our novel descriptor surpasses previous stateof-the-arts on standard benchmarks including patch verification, matching and retrieval tasks.

[1]  Matthew R. Scott,et al.  Multi-Similarity Loss With General Pair Weighting for Deep Metric Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Nanning Zheng,et al.  Kernelized Subspace Pooling for Deep Local Descriptors , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Taghi M. Khoshgoftaar,et al.  A survey on Image Data Augmentation for Deep Learning , 2019, Journal of Big Data.

[4]  Vincent Lepetit,et al.  LIFT: Learned Invariant Feature Transform , 2016, ECCV.

[5]  Junjun Jiang,et al.  Image Matching from Handcrafted to Deep Features: A Survey , 2020, International Journal of Computer Vision.

[6]  Xing Ji,et al.  CosFace: Large Margin Cosine Loss for Deep Face Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Jan-Michael Frahm,et al.  Pixelwise View Selection for Unstructured Multi-View Stereo , 2016, ECCV.

[8]  Szymon Rusinkiewicz,et al.  Learning Local Descriptors With a CDF-Based Dynamic Soft Margin , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[9]  Shih-Fu Chang,et al.  Learning Spread-Out Local Feature Descriptors , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10]  Shengcai Liao,et al.  Deep Metric Learning for Person Re-identification , 2014, 2014 22nd International Conference on Pattern Recognition.

[11]  Gustavo Carneiro,et al.  Learning Local Image Descriptors with Deep Siamese and Triplet Convolutional Networks by Minimizing Global Loss Functions , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  James T. Kwok,et al.  Generalizing from a Few Examples , 2019, ACM Comput. Surv..

[13]  Krystian Mikolajczyk,et al.  SOLAR: Second-Order Loss and Attention for Image Retrieval , 2020, ECCV.

[14]  Yan Lu,et al.  Local Descriptors Optimized for Average Precision , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Torsten Sattler,et al.  Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[17]  Jan-Michael Frahm,et al.  From single image query to detailed 3D reconstruction , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[19]  Yu Xin,et al.  SOSNet: Second Order Similarity Regularization for Local Descriptor Learning , 2019, Computer Vision and Pattern Recognition.

[20]  Jiri Matas,et al.  Working hard to know your neighbor's margins: Local descriptor learning loss , 2017, NIPS.

[21]  Yiannis Demiris,et al.  D2D: Keypoint Extraction with Describe to Detect Approach , 2020, ACCV.

[22]  Stella X. Yu,et al.  Large-Scale Long-Tailed Recognition in an Open World , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Shankar Krishnan,et al.  Filter Response Normalization Layer: Eliminating Batch Dependence in the Training of Deep Neural Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Stefanos Zafeiriou,et al.  ArcFace: Additive Angular Margin Loss for Deep Face Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Margarita Chli,et al.  Learning Deep Descriptors with Scale-Aware Triplet Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Xin Yu,et al.  SOSNet: Second Order Similarity Regularization for Local Descriptor Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Krystian Mikolajczyk,et al.  Learning local feature descriptors with triplets and shallow convolutional neural networks , 2016, BMVC.

[28]  Lei Zhou,et al.  GeoDesc: Learning Local Descriptors by Integrating Geometry Constraints , 2018, ECCV.

[29]  Tomasz Malisiewicz,et al.  SuperPoint: Self-Supervised Interest Point Detection and Description , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[30]  Zhuowen Tu,et al.  Robust Point Matching via Vector Field Consensus , 2014, IEEE Transactions on Image Processing.

[31]  Shuang Wang,et al.  Better and Faster: Exponential Loss for Image Patch Matching , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[32]  Torsten Sattler,et al.  Comparative Evaluation of Hand-Crafted and Learned Local Features , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Long Quan,et al.  ASLFeat: Learning Local Features of Accurate Shape and Localization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Weihong Deng,et al.  SFace: Sigmoid-Constrained Hypersphere Loss for Robust Face Recognition , 2021, IEEE Transactions on Image Processing.

[35]  Feiyue Huang,et al.  CurricularFace: Adaptive Curriculum Learning Loss for Deep Face Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[37]  Pascal Fua,et al.  Beyond Cartesian Representations for Local Descriptors , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[38]  Iasonas Kokkinos,et al.  Discriminative Learning of Deep Convolutional Feature Point Descriptors , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[39]  Bin Fan,et al.  L2-Net: Deep Learning of Discriminative Patch Descriptor in Euclidean Space , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Yichen Wei,et al.  Circle Loss: A Unified Perspective of Pair Similarity Optimization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Leslie N. Smith,et al.  Cyclical Learning Rates for Training Neural Networks , 2015, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[42]  Bhiksha Raj,et al.  SphereFace: Deep Hypersphere Embedding for Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[44]  Andrea Vedaldi,et al.  HPatches: A Benchmark and Evaluation of Handcrafted and Learned Local Descriptors , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Krystian Mikolajczyk,et al.  HyNet: Learning Local Descriptor with Hybrid Similarity Measure and Triplet Loss , 2020, NeurIPS.

[46]  Yair Movshovitz-Attias,et al.  No Fuss Distance Metric Learning Using Proxies , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[47]  Jan-Michael Frahm,et al.  Structure-from-Motion Revisited , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Jian Wang,et al.  Deep Metric Learning with Angular Loss , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[49]  Rahul Sukthankar,et al.  MatchNet: Unifying feature and metric learning for patch-based matching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Carlos D. Castillo,et al.  L2-constrained Softmax Loss for Discriminative Face Verification , 2017, ArXiv.

[51]  Yuxiao Hu,et al.  MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition , 2016, ECCV.

[52]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[53]  Ser-Nam Lim,et al.  A Metric Learning Reality Check , 2020, ECCV.

[54]  Matthew A. Brown,et al.  Learning Local Image Descriptors , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[55]  Silvio Savarese,et al.  Deep Metric Learning via Lifted Structured Feature Embedding , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Songcan Chen,et al.  Recent Advances in Open Set Recognition: A Survey , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57]  Gabriela Csurka,et al.  R2D2: Repeatable and Reliable Detector and Descriptor , 2019, ArXiv.