Semantics Consistent Adversarial Cross-Modal Retrieval

Cross-modal retrieval returns the relevant results from the other modalities given a query from one modality. The main challenge of cross-modal retrieval is the “heterogeneity gap” amongst modalities, because different modalities have different distributions and representations. Therefore, the similarity of different modalities can not be measured directly. In this paper, we propose a semantics consistent adversarial cross-modal retrieval approach, which learns a semantics consistent representation for different modalities with same semantic category. Specifically, we encourage the class center of different modalities with same semantic label to be as close as possible, and also minimize the distances between the samples and the class center with same semantic label from different modalities. Comprehensive experiments on Wikipedia dataset are conducted and the experimental results show the efficiency and effectiveness of our approach in cross-modal retrieval.

[1]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[2]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Wei Wang,et al.  Learning Coupled Feature Spaces for Cross-Modal Matching , 2013, 2013 IEEE International Conference on Computer Vision.

[4]  Wei Wang,et al.  A Comprehensive Survey on Cross-modal Retrieval , 2016, ArXiv.

[5]  Yang Yang,et al.  Start from Scratch: Towards Automatically Identifying, Modeling, and Naming Visual Attributes , 2014, ACM Multimedia.

[6]  Yi Zhen,et al.  Co-Regularized Hashing for Multimodal Data , 2012, NIPS.

[7]  Wei Liu,et al.  Discriminative Dictionary Learning With Common Label Alignment for Cross-Modal Retrieval , 2016, IEEE Transactions on Multimedia.

[8]  Jun Wang,et al.  Comparing apples to oranges: a scalable solution with heterogeneous hashing , 2013, KDD.

[9]  Yao Zhao,et al.  Cross-Modal Retrieval With CNN Visual Features: A New Baseline , 2017, IEEE Transactions on Cybernetics.

[10]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[11]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[12]  Changsheng Xu,et al.  Learning Consistent Feature Representation for Cross-Modal Multimedia Retrieval , 2015, IEEE Transactions on Multimedia.

[13]  Xinbo Gao,et al.  Triplet-Based Deep Hashing Network for Cross-Modal Retrieval , 2018, IEEE Transactions on Image Processing.

[14]  Yueting Zhuang,et al.  Learning of Multimodal Representations With Random Walks on the Click Graph , 2016, IEEE Transactions on Image Processing.

[15]  Huimin Lu,et al.  Motor Anomaly Detection for Unmanned Aerial Vehicles Using Reinforcement Learning , 2018, IEEE Internet of Things Journal.

[16]  Xin Huang,et al.  An Overview of Cross-Media Retrieval: Concepts, Methodologies, Benchmarks, and Challenges , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[17]  Yuxin Peng,et al.  CM-GANs , 2019, ACM Trans. Multim. Comput. Commun. Appl..

[18]  Huimin Lu,et al.  Underwater image dehazing using joint trilateral filter , 2014, Comput. Electr. Eng..

[19]  Huimin Lu,et al.  Deep adversarial metric learning for cross-modal retrieval , 2019, World Wide Web.

[20]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[21]  Xinbo Gao,et al.  Coupled Dictionary Learning with Common Label Alignment for Cross-Modal Retrieval , 2015, IScIDE.

[22]  Xiaohua Zhai,et al.  Learning Cross-Media Joint Representation With Sparse and Semisupervised Regularization , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[23]  Ruifan Li,et al.  Cross-modal Retrieval with Correspondence Autoencoder , 2014, ACM Multimedia.

[24]  Yang Yang,et al.  Adversarial Cross-Modal Retrieval , 2017, ACM Multimedia.

[25]  Nitish Srivastava,et al.  Learning Representations for Multimodal Data with Deep Belief Nets , 2012 .

[26]  Roger Levy,et al.  A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.

[27]  Michael Isard,et al.  A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics , 2012, International Journal of Computer Vision.

[28]  Bin Li,et al.  Wound intensity correction and segmentation with convolutional neural networks , 2017, Concurr. Comput. Pract. Exp..

[29]  Huimin Lu,et al.  Brain Intelligence: Go beyond Artificial Intelligence , 2017, Mobile Networks and Applications.

[30]  Claudio Gutierrez,et al.  Survey of graph database models , 2008, CSUR.

[31]  Tieniu Tan,et al.  Joint Feature Selection and Subspace Learning for Cross-Modal Retrieval , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Philip S. Yu,et al.  Deep Visual-Semantic Hashing for Cross-Modal Retrieval , 2016, KDD.

[33]  Zhenyu He,et al.  Deep convolutional neural networks for thermal infrared object tracking , 2017, Knowl. Based Syst..

[34]  Wu-Jun Li,et al.  Deep Cross-Modal Hashing , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Huimin Lu,et al.  Low illumination underwater light field images reconstruction using deep convolutional neural networks , 2018, Future Gener. Comput. Syst..

[36]  Jianmin Wang,et al.  Semantics-preserving hashing for cross-view retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[38]  Zhenyu He,et al.  A multi-view model for visual tracking via correlation filters , 2016, Knowl. Based Syst..