Collective Affinity Learning for Partial Cross-Modal Hashing

In the past decade, various unsupervised hashing methods have been developed for cross-modal retrieval. However, in real-world applications, it is often the incomplete case that every modality of data may suffer from some missing samples. Most existing works assume that every object appears in both modalities, hence they may not work well for partial multi-modal data. To address this problem, we propose a novel Collective Affinity Learning Method (CALM), which collectively and adaptively learns an anchor graph for generating binary codes on partial multi-modal data. In CALM, we first construct modality-specific bipartite graphs collectively, and derive a probabilistic model to figure out complete data-to-anchor affinities for each modality. Theoretical analysis reveals its ability to recover missing adjacency information. Moreover, a robust model is proposed to fuse these modality-specific affinities by adaptively learning a unified anchor graph. Then, the neighborhood information from the learned anchor graph acts as feedback, which guides the previous affinity reconstruction procedure. To solve the formulated optimization problem, we further develop an effective algorithm with linear time complexity and fast convergence. Last, Anchor Graph Hashing (AGH) is conducted on the fused affinities for cross-modal retrieval. Experimental results on benchmark datasets show that our proposed CALM consistently outperforms the existing methods.

[1]  Ling Shao,et al.  Binary Multi-View Clustering , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Yuxin Peng,et al.  SCH-GAN: Semi-Supervised Cross-Modal Hashing by Generative Adversarial Network , 2018, IEEE Transactions on Cybernetics.

[3]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[4]  Dongqing Zhang,et al.  Large-Scale Supervised Multimodal Hashing with Semantic Correlation Maximization , 2014, AAAI.

[5]  Jun Guo,et al.  Anchors Bring Ease: An Embarrassingly Simple Approach to Partial Multi-View Clustering , 2019, AAAI.

[6]  Qi Tian,et al.  Ensemble Diffusion for Retrieval , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[7]  Michael Isard,et al.  A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics , 2012, International Journal of Computer Vision.

[8]  Maoguo Gong,et al.  Semi-supervised Multimodal Hashing , 2017, ArXiv.

[9]  Xuelong Li,et al.  Deep Binary Reconstruction for Cross-Modal Hashing , 2017, IEEE Transactions on Multimedia.

[10]  Christian Jutten,et al.  Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects , 2015, Proceedings of the IEEE.

[11]  Tieniu Tan,et al.  Half-Quadratic-Based Iterative Minimization for Robust Sparse Representation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Guiguang Ding,et al.  Latent semantic sparse hashing for cross-modal similarity search , 2014, SIGIR.

[13]  Devraj Mandal,et al.  Generalized Semantic Preserving Hashing for N-Label Cross-Modal Retrieval , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[15]  Bin Liu,et al.  Cross-Modal Hamming Hashing , 2018, ECCV.

[16]  Xiaosong Zhao,et al.  Semi-supervised semantic factorization hashing for fast cross-modal retrieval , 2017, Multimedia Tools and Applications.

[17]  Pengfei Shi,et al.  Cross-modality hashing with partial correspondence , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[18]  Quan-Sen Sun,et al.  Semi-paired hashing for cross-view retrieval , 2016, Neurocomputing.

[19]  Qi Tian,et al.  Linear Distance Preserving Pseudo-Supervised and Unsupervised Hashing , 2016, ACM Multimedia.

[20]  Ling Shao,et al.  Cross-Modality Submodular Dictionary Learning for Information Retrieval , 2014, CIKM.

[21]  Ling Shao,et al.  Highly-Economized Multi-View Binary Compression for Scalable Image Clustering , 2018, ECCV.

[22]  Heng Tao Shen,et al.  Hashing on Nonlinear Manifolds , 2014, IEEE Transactions on Image Processing.

[23]  Yuxin Peng,et al.  Unsupervised Generative Adversarial Cross-modal Hashing , 2017, AAAI.

[24]  Jun Wang,et al.  Comparing apples to oranges: a scalable solution with heterogeneous hashing , 2013, KDD.

[25]  Nikos Paragios,et al.  Data fusion through cross-modality metric learning using similarity-sensitive hashing , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[26]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[27]  Jingdong Wang,et al.  Binary Optimized Hashing , 2016, ACM Multimedia.

[28]  Ran He,et al.  Self-Paced Learning: An Implicit Regularization Perspective , 2016, AAAI.

[29]  Louis-Philippe Morency,et al.  Multimodal Machine Learning: A Survey and Taxonomy , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Jianmin Wang,et al.  Correlation Autoencoder Hashing for Supervised Cross-Modal Search , 2016, ICMR.

[31]  Ran He,et al.  X-GACMN: An X-Shaped Generative Adversarial Cross-Modal Network with Hypersphere Embedding , 2018, ACCV.

[32]  Keqiu Li,et al.  Binary Hashing for Approximate Nearest Neighbor Search on Big Data: A Survey , 2018, IEEE Access.

[33]  Wei Liu,et al.  Hashing with Graphs , 2011, ICML.

[34]  Zhou Yu,et al.  Discriminative coupled dictionary hashing for fast cross-media retrieval , 2014, SIGIR.

[35]  Ming Shao,et al.  Robust Multi-view Representation: A Unified Perspective from Multi-view Learning to Domain Adaption , 2018, IJCAI.

[36]  Jungong Han,et al.  Unsupervised Deep Hashing via Binary Latent Factor Models for Large-scale Cross-modal Retrieval , 2018, IJCAI.

[37]  Seungjin Choi,et al.  Multi-view anchor graph hashing , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[38]  Shao-Yuan Li,et al.  Partial Multi-View Clustering , 2014, AAAI.

[39]  Philip S. Yu,et al.  Composite Correlation Quantization for Efficient Multimodal Retrieval , 2015, SIGIR.

[40]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[41]  Zi Huang,et al.  Linear cross-modal hashing for efficient multimedia search , 2013, ACM Multimedia.

[42]  Guiguang Ding,et al.  Collective Matrix Factorization Hashing for Multimodal Data , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Yueting Zhuang,et al.  Learning of Multimodal Representations With Random Walks on the Click Graph , 2016, IEEE Transactions on Image Processing.

[44]  Wu-Jun Li,et al.  Deep Cross-Modal Hashing , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[46]  Zi Huang,et al.  Inter-media hashing for large-scale retrieval from heterogeneous data sources , 2013, SIGMOD '13.

[47]  Dong Cao,et al.  Self-Paced Cross-Modal Subspace Matching , 2016, SIGIR.

[48]  Xin Huang,et al.  An Overview of Cross-Media Retrieval: Concepts, Methodologies, Benchmarks, and Challenges , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[49]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[50]  Rongrong Ji,et al.  Cross-Modality Binary Code Learning via Fusion Similarity Hashing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Beng Chin Ooi,et al.  Effective Multi-Modal Retrieval based on Stacked Auto-Encoders , 2014, Proc. VLDB Endow..

[52]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[53]  Jun Guo,et al.  Partial Multi-View Outlier Detection Based on Collective Learning , 2018, AAAI.

[54]  Heng Tao Shen,et al.  Semi-Paired Discrete Hashing: Learning Latent Hash Codes for Semi-Paired Cross-View Retrieval , 2017, IEEE Transactions on Cybernetics.

[55]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[56]  Andrew W. Fitzgibbon,et al.  PiCoDes: Learning a Compact Code for Novel-Category Recognition , 2011, NIPS.

[57]  Qinghua Hu,et al.  Semi-Supervised Image-to-Video Adaptation for Video Action Recognition , 2017, IEEE Transactions on Cybernetics.

[58]  Raghavendra Udupa,et al.  Learning Hash Functions for Cross-View Similarity Search , 2011, IJCAI.

[59]  Wei-Shi Zheng,et al.  Semi-Supervised Multi-View Discrete Hashing for Fast Image Search , 2017, IEEE Transactions on Image Processing.

[60]  Zhigang Luo,et al.  Collaborative Subspace Graph Hashing for Cross-modal Retrieval , 2018, ICMR.

[61]  Trevor Darrell,et al.  Learning cross-modality similarity for multinomial data , 2011, 2011 International Conference on Computer Vision.

[62]  Xuelong Li,et al.  Large Graph Hashing with Spectral Rotation , 2017, AAAI.

[63]  Mila Nikolova,et al.  Analysis of Half-Quadratic Minimization Methods for Signal and Image Recovery , 2005, SIAM J. Sci. Comput..

[64]  Qi Tian,et al.  Smooth Neighborhood Structure Mining on Multiple Affinity Graphs with Applications to Context-Sensitive Similarity , 2016, ECCV.

[65]  Yang Yang,et al.  Adversarial Cross-Modal Retrieval , 2017, ACM Multimedia.

[66]  Dacheng Tao,et al.  A Survey on Multi-view Learning , 2013, ArXiv.

[67]  Jonghyun Choi,et al.  Predictable Dual-View Hashing , 2013, ICML.

[68]  Luo Si,et al.  Learning to Hash on Partial Multi-Modal Data , 2015, IJCAI.

[69]  Wei Wang,et al.  A Comprehensive Survey on Cross-modal Retrieval , 2016, ArXiv.

[70]  Yi Zhen,et al.  Co-Regularized Hashing for Multimodal Data , 2012, NIPS.

[71]  Jieping Ye,et al.  A least squares formulation for canonical correlation analysis , 2008, ICML '08.

[72]  Wu-Jun Li,et al.  Isotropic Hashing , 2012, NIPS.

[73]  Fei Wang,et al.  Composite hashing with multiple information sources , 2011, SIGIR.

[74]  Geyong Min,et al.  Deep Discrete Cross-Modal Hashing for Cross-Media Retrieval , 2018, Pattern Recognit..

[75]  Xin-Shun Xu Dictionary Learning Based Hashing for Cross-Modal Retrieval , 2016, ACM Multimedia.

[76]  Zhou Yu,et al.  Sparse Multi-Modal Hashing , 2014, IEEE Transactions on Multimedia.

[77]  Lin Yang,et al.  Asymmetric Discrete Graph Hashing , 2017, AAAI.

[78]  Rongrong Ji,et al.  Dense Auto-Encoder Hashing for Robust Cross-Modality Retrieval , 2018, ACM Multimedia.

[79]  Jianmin Wang,et al.  Semantics-preserving hashing for cross-view retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).