Learning Local Descriptors by Optimizing the Keypoint-Correspondence Criterion : Applications to Face Matching , Learning from Unlabeled Videos and 3 D-Shape Retrieval

Current best local descriptors are learned on a large dataset of matching and non-matching keypoint pairs. However, data of this kind is not always available since detailed keypoint correspondences can be hard to establish. On the other hand, we can often obtain labels for pairs of keypoint bags. For example, keypoint bags extracted from two images of the same object under different views form a matching pair, and keypoint bags extracted from images of different objects form a non-matching pair. On average, matching pairs should contain more corresponding keypoints than non-matching pairs. We describe an end-to-end differentiable architecture that enables the learning of local keypoint descriptors from such weakly-labeled data.

[1]  Tomás Pajdla,et al.  NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[3]  Nikos Komodakis,et al.  Learning to compare image patches via convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[5]  Edward K. Wong,et al.  Deepshape: Deep learned shape descriptor for 3D shape matching and retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Andrew Zisserman,et al.  Fisher Vector Faces in the Wild , 2013, BMVC.

[7]  Tom Drummond,et al.  Machine Learning for High-Speed Corner Detection , 2006, ECCV.

[8]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[9]  Krystian Mikolajczyk,et al.  PN-Net: Conjoined Triple Deep Network for Learning Local Image Descriptors , 2016, ArXiv.

[10]  Nitish Srivastava Unsupervised Learning of Visual Representations using Videos , 2015 .

[11]  Matthew A. Brown,et al.  Recognising panoramas , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[12]  Xiaowei Zhou,et al.  3D Shape Reconstruction from 2D Landmarks: A Convex Formulation , 2014, ArXiv.

[13]  Guillaume Lavoué,et al.  Combination of bag-of-words descriptors for robust partial shape retrieval , 2012, The Visual Computer.

[14]  Kin-Man Lam,et al.  High-Resolution Face Verification Using Pore-Scale Facial Features , 2015, IEEE Transactions on Image Processing.

[15]  Xin Yang,et al.  Local difference binary for ultrafast and distinctive feature description. , 2013, IEEE transactions on pattern analysis and machine intelligence.

[16]  Christian Osendorfer,et al.  Convolutional Neural Networks Learn Compact Local Image Descriptors , 2013, ICONIP.

[17]  Krystian Mikolajczyk,et al.  Learning local feature descriptors with triplets and shallow convolutional neural networks , 2016, BMVC.

[18]  Koray Kavukcuoglu,et al.  Pixel Recurrent Neural Networks , 2016, ICML.

[19]  Cordelia Schmid,et al.  Convolutional Kernel Networks , 2014, NIPS.

[20]  Martial Hebert,et al.  Shuffle and Learn: Unsupervised Learning Using Temporal Order Verification , 2016, ECCV.

[21]  Iasonas Kokkinos,et al.  Discriminative Learning of Deep Convolutional Feature Point Descriptors , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[22]  Bin Fan,et al.  L2-Net: Deep Learning of Discriminative Patch Descriptor in Euclidean Space , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Fernando De la Torre,et al.  Supervised Descent Method and Its Applications to Face Alignment , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Vincent Lepetit,et al.  TILDE: A Temporally Invariant Learned DEtector , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Iasonas Kokkinos,et al.  Fracking Deep Convolutional Image Descriptors , 2014, ArXiv.

[26]  Rahul Sukthankar,et al.  MatchNet: Unifying feature and metric learning for patch-based matching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Longin Jan Latecki,et al.  GIFT: A Real-Time and Scalable 3D Shape Search Engine , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Xuelong Li,et al.  Learning Compact Feature Descriptor and Adaptive Matching Framework for Face Recognition , 2015, IEEE Transactions on Image Processing.

[29]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[30]  Ioannis Pratikakis,et al.  3D Object Retrieval using an Efficient and Compact Hybrid Shape Descriptor , 2008, 3DOR@Eurographics.

[31]  Ali Shokoufandeh,et al.  Retrieving articulated 3-D models using medial surfaces , 2008, Machine Vision and Applications.

[32]  Jiri Matas,et al.  Working hard to know your neighbor's margins: Local descriptor learning loss , 2017, NIPS.

[33]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[34]  Afzal Godil,et al.  Salient local 3D features for 3D shape retrieval , 2011, Electronic Imaging.

[35]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[36]  Hamid Laga,et al.  Compact Vectors of Locally Aggregated Tensors for 3D Shape Retrieval , 2013, 3DOR@Eurographics.

[37]  Xin Yang,et al.  Local Difference Binary for Ultrafast and Distinctive Feature Description , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[39]  Andrew Zisserman,et al.  All About VLAD , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Gang Hua,et al.  Discriminative Learning of Local Image Descriptors , 1990, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Ioannis Pratikakis,et al.  Retrieval of 3D Articulated Objects Using a Graph-based Representation , 2009, 3DOR@Eurographics.

[42]  Vincent Lepetit,et al.  Boosting Binary Keypoint Descriptors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Shmuel Peleg,et al.  Temporal Segmentation of Egocentric Videos , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[45]  Paolo Favaro,et al.  Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles , 2016, ECCV.

[46]  Michael Goesele,et al.  Multi-View Stereo for Community Photo Collections , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[47]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[48]  Thomas A. Funkhouser,et al.  The Princeton Shape Benchmark , 2004, Proceedings Shape Modeling Applications, 2004..

[49]  Yasuyuki Matsushita,et al.  GMS: Grid-Based Motion Statistics for Fast, Ultra-robust Feature Correspondence , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Cordelia Schmid,et al.  Local Convolutional Features with Unsupervised Training for Image Retrieval , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[51]  Barbara Caputo,et al.  Recognition with local features: the kernel recipe , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[52]  Andrew Zisserman,et al.  Learning Local Feature Descriptors Using Convex Optimisation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[54]  David W. Jacobs,et al.  WarpNet: Weakly Supervised Matching for Single-View Reconstruction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Arman Savran,et al.  Bosphorus Database for 3D Face Analysis , 2008, BIOID.

[56]  Hamid Laga,et al.  Covariance Descriptors for 3D Shape Matching and Retrieval , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[57]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Pascal Fua,et al.  Receptive Fields Selection for Binary Feature Description , 2014, IEEE Transactions on Image Processing.

[59]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[60]  Silvio Savarese,et al.  Universal Correspondence Network , 2016, NIPS.

[61]  Andrea Vedaldi,et al.  HPatches: A Benchmark and Evaluation of Handcrafted and Learned Local Descriptors , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Ming Ouhyoung,et al.  On Visual Similarity Based 3D Model Retrieval , 2003, Comput. Graph. Forum.

[63]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[64]  Matthew A. Brown,et al.  Learning Local Image Descriptors , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[65]  Vincent Lepetit,et al.  LIFT: Learned Invariant Feature Transform , 2016, ECCV.

[66]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[67]  Ming Yang,et al.  Query Specific Rank Fusion for Image Retrieval , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.