Learning sparse latent representation and distance metric for image retrieval

The performance of image retrieval depends critically on the semantic representation and the distance function used to estimate the similarity of two images. A good representation should integrate multiple visual and textual (e.g., tag) features and offer a step closer to the true semantics of interest (e.g., concepts). As the distance function operates on the representation, they are interdependent, and thus should be addressed at the same time. We propose a probabilistic solution to learn both the representation from multiple feature types and modalities and the distance metric from data. The learning is regularised so that the learned representation and information-theoretic metric will (i) preserve the regularities of the visual/textual spaces, (ii) enhance structured sparsity, (iii) encourage small intra-concept distances, and (iv) keep inter-concept images separated. We demonstrate the capacity of our method on the NUS-WIDE data. For the well-studied 13 animal subset, our method outperforms state-of-the-art rivals. On the subset of single-concept images, we gain 79:5% improvement over the standard nearest neighbours approach on the MAP score, and 45.7% on the NDCG.

[1]  Svetha Venkatesh,et al.  Learning Boltzmann Distance Metric for Face Recognition , 2012, 2012 IEEE International Conference on Multimedia and Expo.

[2]  Svetha Venkatesh,et al.  A Bayesian Framework for Learning Shared and Individual Subspaces from Multiple Data Sources , 2011, PAKDD.

[3]  Daphna Weinshall,et al.  Learning distance functions for image retrieval , 2004, CVPR 2004.

[4]  Tomer Hertz,et al.  Learning distance functions for image retrieval , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[5]  Geoffrey E. Hinton,et al.  Neighbourhood Components Analysis , 2004, NIPS.

[6]  David Haussler,et al.  Unsupervised learning of distributions on binary vectors using two layer networks , 1991, NIPS 1991.

[7]  Ruimin Shen,et al.  Sparse Group Restricted Boltzmann Machines , 2010, AAAI.

[8]  Geoffrey E. Hinton,et al.  Semantic hashing , 2009, Int. J. Approx. Reason..

[9]  Ning Chen,et al.  Predictive Subspace Learning for Multi-view Data: a Large Margin Approach , 2010, NIPS.

[10]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[11]  Svetha Venkatesh,et al.  Mixed-Variate Restricted Boltzmann Machines , 2014, ACML.

[12]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[13]  Svetha Venkatesh,et al.  A Slice Sampler for Restricted Hierarchical Beta Process with Applications to Shared Subspace Learning , 2012, UAI.

[14]  Nicu Sebe,et al.  A New Study on Distance Metrics as Similarity Measurement , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[15]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[16]  Xuelong Li,et al.  Direct kernel biased discriminant analysis: a new content-based image retrieval relevance feedback algorithm , 2006, IEEE Transactions on Multimedia.

[17]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.