Scalable Nonlinear Embeddings for Semantic Category-Based Image Retrieval

We propose a novel algorithm for the task of supervised discriminative distance learning by nonlinearly embedding vectors into a low dimensional Euclidean space. We work in the challenging setting where supervision is with constraints on similar and dissimilar pairs while training. The proposed method is derived by an approximate kernelization of a linear Mahalanobis-like distance metric learning algorithm and can also be seen as a kernel neural network. The number of model parameters and test time evaluation complexity of the proposed method are O(dD) where D is the dimensionality of the input features and d is the dimension of the projection space -- this is in contrast to the usual kernelization methods as, unlike them, the complexity does not scale linearly with the number of training examples. We propose a stochastic gradient based learning algorithm which makes the method scalable (w.r.t. the number of training examples), while being nonlinear. We train the method with up to half a million training pairs of 4096 dimensional CNN features. We give empirical comparisons with relevant baselines on seven challenging datasets for the task of low dimensional semantic category based image retrieval.

[1]  Lorenzo Torresani,et al.  Large Margin Component Analysis , 2006, NIPS.

[2]  Andrew Zisserman,et al.  Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[3]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[4]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[5]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[6]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[7]  Boonserm Kijsirikul,et al.  A new kernelization framework for Mahalanobis distance learning algorithms , 2010, Neurocomputing.

[8]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[9]  Shishir K. Shah,et al.  A survey of approaches and trends in person re-identification , 2014, Image Vis. Comput..

[10]  Cordelia Schmid,et al.  Is that you? Metric learning approaches for face identification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[11]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[12]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[13]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[14]  Ivor W. Tsang,et al.  Distance metric learning with kernels , 2003 .

[15]  Subhransu Maji,et al.  Deep convolutional filter banks for texture recognition and segmentation , 2014, ArXiv.

[16]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[17]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[18]  F. Jurie,et al.  Learning Non-linear SVM in Input Space for Image Classification , 2014 .

[19]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[20]  Yoram Singer,et al.  Online and batch learning of pseudo-metrics , 2004, ICML.

[21]  Cordelia Schmid,et al.  TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[22]  Brian Kulis,et al.  Metric Learning: A Survey , 2013, Found. Trends Mach. Learn..

[23]  Ivor W. Tsang,et al.  The pre-image problem in kernel methods , 2003, IEEE Transactions on Neural Networks.

[24]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[25]  Jiwen Lu,et al.  Neighborhood repulsed metric learning for kinship verification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Stephen Tyree,et al.  Non-linear Metric Learning , 2012, NIPS.

[27]  Andrew Zisserman,et al.  Fisher Vector Faces in the Wild , 2013, BMVC.

[28]  Christopher J. C. Burges,et al.  Simplified Support Vector Decision Rules , 1996, ICML.

[29]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[30]  Anton van den Hengel,et al.  Non-sparse linear representations for visual tracking with online reservoir metric learning , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[32]  Amir Globerson,et al.  Metric Learning by Collapsing Classes , 2005, NIPS.

[33]  David J. Fleet,et al.  Hamming Distance Metric Learning , 2012, NIPS.

[34]  Wei Liu,et al.  Learning Distance Metrics with Contextual Constraints for Image Retrieval , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[35]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[36]  Geoffrey E. Hinton,et al.  Learning a Nonlinear Embedding by Preserving Class Neighbourhood Structure , 2007, AISTATS.

[37]  Frédéric Jurie,et al.  PCCA: A new approach for distance learning from sparse pairwise constraints , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Cordelia Schmid,et al.  Expanded Parts Model for Human Attribute and Action Recognition in Still Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Inderjit S. Dhillon,et al.  Online Metric Learning and Fast Similarity Search , 2008, NIPS.

[40]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[41]  Cees Snoek,et al.  Attributes Make Sense on Segmented Objects , 2014, ECCV.

[42]  David Stutz,et al.  Neural Codes for Image Retrieval , 2015 .

[43]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[44]  Katja Markert,et al.  Learning Models for Object Recognition from Natural Language Descriptions , 2009, BMVC.

[45]  Geoffrey E. Hinton,et al.  Neighbourhood Components Analysis , 2004, NIPS.

[46]  Edward H. Adelson,et al.  Material perception: What can you see in a brief glance? , 2010 .

[47]  Gaurav Sharma,et al.  Learning discriminative spatial representation for image classification , 2011, BMVC.

[48]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[49]  Thorsten Joachims,et al.  Learning a Distance Metric from Relative Comparisons , 2003, NIPS.

[50]  Andrew Zisserman,et al.  Efficient additive kernels via explicit feature maps , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[51]  Marc Sebban,et al.  A Survey on Metric Learning for Feature Vectors and Structured Data , 2013, ArXiv.

[52]  Gabriela Csurka,et al.  Metric Learning for Large Scale Image Classification: Generalizing to New Classes at Near-Zero Cost , 2012, ECCV.

[53]  Gaurav Sharma,et al.  A Novel Approach for Efficient SVM Classification with Histogram Intersection Kernel , 2013, BMVC.