Semi-Supervised Weight Learning for the Spatial Search Method in ConvNet-Based Image Retrieval

As the state-of-the-art ConvNet-based image retrieval method, spatial search has shown excellent retrieval performance and outperformed other competitors. A key component of this method is a weighted combination of distances evaluated at different regions of a query image. However, these weights are currently manually tuned, by a trial-and-error based exhaustive search. This not only incurs a lengthy parameter tuning process, but is also hard to guarantee the optimality of the tuned weights. Moreover, these weights may not be generally applied when the nature of image data set changes. To improve this situation, we propose to automatically learn the combination weights based on retrieval groundtruth. Specifically, we develop a method, called semi-supervised weight learning (SWL), based on the framework of distance metric learning. In addition to generating triplet constraints with retrieval groundtruth, we leverage unlabelled images to generate numerous unsupervised constraints to stabilise the learning process and improve learning efficiency. By linking with the latest primal solver of linear support vector machines, an efficient algorithm is put forward to solve the resulting large-scale optimization problem. Experimental results on three benchmark data sets and a newly collected archival photo data set demonstrate the effectiveness of the proposed weight learning approach. It achieves comparable or better retrieval performance than the manual tuning approach, especially on the new archival photo data set.

[1]  Chih-Jen Lin,et al.  Trust Region Newton Method for Logistic Regression , 2008, J. Mach. Learn. Res..

[2]  Limin Wang,et al.  Places205-VGGNet Models for Scene Recognition , 2015, ArXiv.

[3]  Ronan Sicre,et al.  Particular object retrieval with integral max-pooling of CNN activations , 2015, ICLR.

[4]  Ondrej Chum,et al.  CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples , 2016, ECCV.

[5]  Daniel P. Robinson,et al.  A primal-dual augmented Lagrangian , 2010, Computational Optimization and Applications.

[6]  Larry S. Davis,et al.  Exploiting local features from deep networks for image retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[7]  Josef Kittler,et al.  Multiple Kernel Learning via Distance Metric Learning for Interactive Image Retrieval , 2011, MCS.

[8]  Feiping Nie,et al.  New primal SVM solver with linear computational cost for big data classifications , 2014, ICML 2014.

[9]  Victor S. Lempitsky,et al.  Aggregating Local Deep Features for Image Retrieval , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[11]  Chih-Jen Lin,et al.  Coordinate Descent Method for Large-scale L2-loss Linear Support Vector Machines , 2008, J. Mach. Learn. Res..

[12]  Yang Song,et al.  Learning Fine-Grained Image Similarity with Deep Ranking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[14]  Svetlana Lazebnik,et al.  Multi-scale Orderless Pooling of Deep Convolutional Activation Features , 2014, ECCV.

[15]  Simon Osindero,et al.  Cross-Dimensional Weighting for Aggregated Deep Convolutional Features , 2015, ECCV Workshops.

[16]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[17]  R. Venkatesh Babu,et al.  Object level deep feature pooling for compact image representation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[18]  Tomás Pajdla,et al.  NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  David Stutz,et al.  Neural Codes for Image Retrieval , 2015 .

[21]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Atsuto Maki,et al.  Visual Instance Retrieval with Deep Convolutional Networks , 2014, ICLR.

[24]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[25]  Ji Wan,et al.  Deep Learning for Content-Based Image Retrieval: A Comprehensive Study , 2014, ACM Multimedia.

[26]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.