Label Prediction Framework For Semi-Supervised Cross-Modal Retrieval

Cross-modal data matching refers to retrieval of data from one modality, when given a query from another modality. In general, supervised algorithms achieve better retrieval performance compared to their unsupervised counterpart, as they can learn better representative features by leveraging the available label information. However, this comes at the cost of requiring huge amount of labeled examples, which may not always be available. In this work, we propose a novel framework in a semi-supervised cross-modal retrieval setting, which can predict the labels of the unlabeled data using complementary information from different modalities. The proposed framework can be used as an add-on with any baseline cross-modal algorithm to give significant performance improvement, even in case of limited labeled data. Extensive evaluation using several baseline algorithms across three different datasets show the effectiveness of our label prediction framework.

[1]  Devraj Mandal,et al.  A Deep Learning Framework for Semi-Supervised Cross-Modal Retrieval with Label Prediction , 2018, ArXiv.

[2]  Devraj Mandal,et al.  Label consistent matrix factorization based hashing for cross-modal retrieval , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[3]  Yu Qiao,et al.  A Discriminative Feature Learning Approach for Deep Face Recognition , 2016, ECCV.

[4]  Harri Valpola,et al.  Weight-averaged consistency targets improve semi-supervised deep learning results , 2017, ArXiv.

[5]  Jost Tobias Springenberg,et al.  Unsupervised and Semi-supervised Learning with Categorical Generative Adversarial Networks , 2015, ICLR.

[6]  Jason Weston,et al.  Deep learning via semi-supervised embedding , 2008, ICML '08.

[7]  Liang Wang,et al.  Cross-Modal Subspace Learning via Pairwise Constraints , 2014, IEEE Transactions on Image Processing.

[8]  Ling Shao,et al.  Supervised Matrix Factorization Hashing for Cross-Modal Retrieval , 2016, IEEE Transactions on Image Processing.

[9]  Le Song,et al.  Iterative Learning with Open-set Noisy Labels , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Tapani Raiko,et al.  Semi-supervised Learning with Ladder Networks , 2015, NIPS.

[11]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[12]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[13]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[14]  Qi Tian,et al.  Adaptively Unified Semi-supervised Learning for Cross-Modal Retrieval , 2017, IJCAI.

[15]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[16]  Behnam Gholami,et al.  Probabilistic Semi-Supervised Multi-Modal Hashing , 2016, BMVC.

[17]  Dong-Hyun Lee,et al.  Pseudo-Label : The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks , 2013 .

[18]  Rongrong Ji,et al.  Cross-Modality Binary Code Learning via Fusion Similarity Hashing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Roger Levy,et al.  A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.

[20]  Devraj Mandal,et al.  Generalized Semantic Preserving Hashing for N-Label Cross-Modal Retrieval , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Abhinav Gupta,et al.  Learning from Noisy Large-Scale Datasets with Minimal Supervision , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Jian-Huang Lai,et al.  Deep Growing Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[23]  Nikhil Rasiwasia,et al.  Cluster Canonical Correlation Analysis , 2014, AISTATS.

[24]  Nir Ailon,et al.  Semi-supervised deep learning by metric embedding , 2016, ICLR.

[25]  Toshihiko Yamasaki,et al.  Multi-label Fashion Image Classification with Minimal Human Supervision , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[26]  Jianmin Wang,et al.  Semantics-preserving hashing for cross-view retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Devraj Mandal,et al.  GrowBit: Incremental Hashing for Cross-Modal Retrieval , 2018, ACCV.

[28]  Timo Aila,et al.  Temporal Ensembling for Semi-Supervised Learning , 2016, ICLR.

[29]  Augustus Odena,et al.  Semi-Supervised Learning with Generative Adversarial Networks , 2016, ArXiv.

[30]  Guiguang Ding,et al.  Collective Matrix Factorization Hashing for Multimodal Data , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Yann LeCun,et al.  Stacked What-Where Auto-encoders , 2015, ArXiv.

[32]  Xiaosong Zhao,et al.  Semi-supervised semantic factorization hashing for fast cross-modal retrieval , 2017, Multimedia Tools and Applications.

[33]  Yuxin Peng,et al.  SCH-GAN: Semi-Supervised Cross-Modal Hashing by Generative Adversarial Network , 2018, IEEE Transactions on Cybernetics.

[34]  Dongqing Zhang,et al.  Large-Scale Supervised Multimodal Hashing with Semantic Correlation Maximization , 2014, AAAI.

[35]  Xing Xu,et al.  Semi-supervised Coupled Dictionary Learning for Cross-modal Retrieval in Internet Images and Texts , 2015, ACM Multimedia.

[36]  Qi Tian,et al.  Generalized Semi-supervised and Structured Subspace Learning for Cross-Modal Retrieval , 2018, IEEE Transactions on Multimedia.

[37]  Yang Yang,et al.  Adversarial Cross-Modal Retrieval , 2017, ACM Multimedia.