Learning local feature representation from matching, clustering and spatial transform

Abstract This paper focuses on learning the local image region representation via deep neural networks. Existing works mainly learn from matched corresponding image patches, with which the learned feature is too sensitive to the individual local patch matching result and cannot handle aggregation based tasks such as image level retrieval. Thus, we propose to use both the matched corresponding image patches and the clustering result as labels for the network training. To resolve the inconsistency between the matched correspondences and clustering results, we propose a semi-supervised iterative training scheme together with a dual margins loss. Moreover, a jointly learned spatial transform prediction network is utilized to obtain better spatial transform invariance of the learned local features. Using SIFT as the label initializer, experimental results show the comparable or even better performance than the hand-crafted feature, which sheds lights on learning local feature representation in an unsupervised or weakly supervised manner.

[1]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Torsten Sattler,et al.  Comparative Evaluation of Hand-Crafted and Learned Local Features , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Florent Perronnin,et al.  Large-scale image retrieval with compressed Fisher vectors , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  Ethan Rublee,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[5]  Andrew Zisserman,et al.  Descriptor Learning Using Convex Optimisation , 2012, ECCV.

[6]  Jianfei Cai,et al.  Weakly Supervised Fine-Grained Categorization With Part-Based Image Representation , 2016, IEEE Transactions on Image Processing.

[7]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[8]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[9]  Michael Isard,et al.  Descriptor Learning for Efficient Retrieval , 2010, ECCV.

[10]  Claire Cardie,et al.  Constrained K-means Clustering with Background Knowledge , 2001, ICML.

[11]  Ian Davidson,et al.  On constrained spectral clustering and its applications , 2012, Data Mining and Knowledge Discovery.

[12]  Myra Spiliopoulou,et al.  C-DBSCAN: Density-Based Clustering with Constraints , 2009, RSFDGrC.

[13]  Raymond J. Mooney,et al.  Integrating constraints and metric learning in semi-supervised clustering , 2004, ICML.

[14]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[16]  Vincent Lepetit,et al.  LIFT: Learned Invariant Feature Transform , 2016, ECCV.

[17]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[18]  Lihi Zelnik-Manor,et al.  Template Matching with Deformable Diversity Similarity , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Chongyu Chen,et al.  Using 3D face priors for depth recovery , 2017, J. Vis. Commun. Image Represent..

[20]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Hendrik Blockeel,et al.  Constraint-based clustering selection , 2017, Machine Learning.

[22]  Björn Ommer,et al.  Deep Semantic Feature Matching , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Arthur Zimek,et al.  A framework for semi-supervised and unsupervised optimal extraction of clusters from hierarchies , 2013, Data Mining and Knowledge Discovery.

[24]  Tomer Hertz,et al.  Computing Gaussian Mixture Models with EM Using Equivalence Constraints , 2003, NIPS.

[25]  David W. Jacobs,et al.  WarpNet: Weakly Supervised Matching for Single-View Reconstruction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Silvio Savarese,et al.  Deep Metric Learning via Lifted Structured Feature Embedding , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Tomer Hertz,et al.  Learning Distance Functions using Equivalence Relations , 2003, ICML.

[28]  Rahul Sukthankar,et al.  MatchNet: Unifying feature and metric learning for patch-based matching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Jörg Sander,et al.  Semi-supervised Density-Based Clustering , 2009, 2009 Ninth IEEE International Conference on Data Mining.