Alternating Co-Quantization for Cross-Modal Hashing

This paper addresses the problem of unsupervised learning of binary hash codes for efficient cross-modal retrieval. Many unimodal hashing studies have proven that both similarity preservation of data and maintenance of quantization quality are essential for improving retrieval performance with binary hash codes. However, most existing cross-modal hashing methods mainly have focused on the former, and the latter still remains almost untouched. We propose a method to minimize the binary quantization errors, which is tailored to cross-modal hashing. Our approach, named Alternating Co-Quantization (ACQ), alternately seeks binary quantizers for each modality space with the help of connections to other modality data so that they give minimal quantization errors while preserving data similarities. ACQ can be coupled with various existing cross-modal dimension reduction methods such as Canonical Correlation Analysis (CCA) and substantially boosts their retrieval performance in the Hamming space. Extensive experiments demonstrate that ACQ can outperform several state-of-the-art methods, even when it is combined with simple CCA.

[1]  Lior Wolf,et al.  Associating neural word embeddings with deep image representations using Fisher Vectors , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Nikos Paragios,et al.  Data fusion through cross-modality metric learning using similarity-sensitive hashing , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Yi Zhen,et al.  Co-Regularized Hashing for Multimodal Data , 2012, NIPS.

[5]  Jian Sun,et al.  K-Means Hashing: An Affinity-Preserving Quantization Method for Learning Binary Compact Codes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Wu-Jun Li,et al.  Double-Bit Quantization for Hashing , 2012, AAAI.

[7]  Gang Chen,et al.  Adaptive Quantization for Hashing: An Information-Based Approach to Learning Binary Codes , 2014, SDM.

[8]  Guiguang Ding,et al.  Collective Matrix Factorization Hashing for Multimodal Data , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Shih-Fu Chang,et al.  Locally Linear Hashing for Extracting Non-linear Manifolds , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[11]  Shih-Fu Chang,et al.  Spherical hashing , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Jonghyun Choi,et al.  Predictable Dual-View Hashing , 2013, ICML.

[13]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[14]  Raghavendra Udupa,et al.  Learning Hash Functions for Cross-View Similarity Search , 2011, IJCAI.

[15]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[16]  Sanjiv Kumar,et al.  Angular Quantization-based Binary Codes for Fast Similarity Search , 2012, NIPS.

[17]  Shih-Fu Chang,et al.  Semi-Supervised Hashing for Large-Scale Search , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Yi Yang,et al.  A Multimedia Retrieval Framework Based on Semi-Supervised Ranking and Relevance Feedback , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Wu-Jun Li,et al.  Isotropic Hashing , 2012, NIPS.

[20]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[21]  Yi Zhen,et al.  A probabilistic model for multimodal hash function learning , 2012, KDD.

[22]  Shuicheng Yan,et al.  Neighborhood preserving embedding , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[23]  Zi Huang,et al.  Inter-media hashing for large-scale retrieval from heterogeneous data sources , 2013, SIGMOD '13.

[24]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Dongqing Zhang,et al.  Large-Scale Supervised Multimodal Hashing with Semantic Correlation Maximization , 2014, AAAI.

[26]  Wei Liu,et al.  Hashing with Graphs , 2011, ICML.

[27]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[28]  Xianglong Liu,et al.  Adaptive multi-bit quantization for hashing , 2015, Neurocomputing.

[29]  Cordelia Schmid,et al.  Multimodal semi-supervised learning for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[30]  David W. Jacobs,et al.  Generalized Multiview Analysis: A discriminative latent space , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[32]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[33]  Fumin Shen,et al.  Inductive Hashing on Manifolds , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Dong Liu,et al.  A Bayesian Approach to Multimodal Visual Dictionary Learning , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.

[36]  Victor S. Lempitsky,et al.  Neural Codes for Image Retrieval , 2014, ECCV.

[37]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Minyi Guo,et al.  Manhattan hashing for large-scale image retrieval , 2012, SIGIR '12.