Twin-Net Descriptor: Twin Negative Mining With Quad Loss for Patch-Based Matching

Local keypoint matching is an important step for computer vision based tasks. In recent years, Deep Convolutional Neural Network (CNN) based strategies have been employed to learn descriptor generation to enhance keypoint matching accuracy. Recent state-of-art works in this direction primarily rely upon a triplet based loss function (and its variations) utilizing three samples: an anchor, a positive and a negative. In this work we propose a novel “Twin Negative Mining” based sampling strategy coupled with a Quad loss function to train a deep neural network based pipeline (Twin-Net) for generating a robust descriptor that provides an increased discriminatory power to differentiate between patches that do not correspond to each other. Our sampling strategy and choice of loss function is aimed at placing an upper bound that descriptors of two patches representing same location could be at worst no more dissimilar than the descriptors of two similar looking patches that do-not belong to same 3D location. This results in an increase in the generalization capability of the network and outperforms its existing counterparts when trained over the same datasets. Twin-Net outputs a 128-dimensional descriptor and uses $L_{2}$ Distance as the similarity metric, and hence conforms to the classical descriptor matching pipelines such as that of SIFT. Our results on Brown and HPatches datasets demonstrate Twin-Net’s consistently better performance as well as better discriminatory and generalization capability as compared to the state-of-art.

[1]  Andrea Vedaldi,et al.  HPatches: A Benchmark and Evaluation of Handcrafted and Learned Local Descriptors , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Jiri Matas,et al.  WxBS: Wide Baseline Stereo Generalizations , 2015, BMVC.

[4]  Pascal Fua,et al.  LF-Net: Learning Local Features from Images , 2018, NeurIPS.

[5]  Tieniu Tan,et al.  Deep semantic ranking based hashing for multi-label image retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Vincent Lepetit,et al.  TILDE: A Temporally Invariant Learned DEtector , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Shehzad Muhammad Hanif,et al.  Patch match networks: Improved two-channel and Siamese networks for image patch matching , 2019, Pattern Recognit. Lett..

[8]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[9]  Kaiqi Huang,et al.  Beyond Triplet Loss: A Deep Quadruplet Network for Person Re-identification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Rahul Sukthankar,et al.  MatchNet: Unifying feature and metric learning for patch-based matching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Torsten Sattler,et al.  Quad-Networks: Unsupervised Learning to Rank for Interest Point Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Krystian Mikolajczyk,et al.  Learning local feature descriptors with triplets and shallow convolutional neural networks , 2016, BMVC.

[14]  Vincent Lepetit,et al.  LIFT: Learned Invariant Feature Transform , 2016, ECCV.

[15]  Lucas Beyer,et al.  In Defense of the Triplet Loss for Person Re-Identification , 2017, ArXiv.

[16]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[17]  Michael Isard,et al.  Descriptor Learning for Efficient Retrieval , 2010, ECCV.

[18]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[19]  Jan-Michael Frahm,et al.  From single image query to detailed 3D reconstruction , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[21]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[22]  Andrew Zisserman,et al.  Learning Local Feature Descriptors Using Convex Optimisation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, CVPR 2004.

[24]  Jiri Matas,et al.  Working hard to know your neighbor's margins: Local descriptor learning loss , 2017, NIPS.

[25]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26]  Yu Qiao,et al.  A Discriminative Feature Learning Approach for Deep Face Recognition , 2016, ECCV.

[27]  Tomasz Malisiewicz,et al.  SuperPoint: Self-Supervised Interest Point Detection and Description , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[28]  Gustavo Carneiro,et al.  Learning Local Image Descriptors with Deep Siamese and Triplet Convolutional Networks by Minimizing Global Loss Functions , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Yann LeCun,et al.  Computing the stereo matching cost with a convolutional neural network , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Iasonas Kokkinos,et al.  Discriminative Learning of Deep Convolutional Feature Point Descriptors , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[32]  Bin Fan,et al.  L2-Net: Deep Learning of Discriminative Patch Descriptor in Euclidean Space , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Nikos Komodakis,et al.  Learning to compare image patches via convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[35]  Yan Lu,et al.  Local Descriptors Optimized for Average Precision , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Gang Hua,et al.  Discriminative Learning of Local Image Descriptors , 1990, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Shih-Fu Chang,et al.  Learning Discriminative and Transformation Covariant Local Feature Detectors , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Albert Gordo,et al.  Deep Image Retrieval: Learning Global Representations for Image Search , 2016, ECCV.