Modal-Dependent Retrieval Based on Mid-Level Semantic Enhancement Space

Cross-media retrieval refers to submitting any kind of media type and obtaining similar results in the form of different media types. The existing cross-media retrieval algorithms typically learn a pair of linear (or non-linear) mappings that project image and textual features from their original heterogeneous feature space into a common homogeneous space, all these traditional methods are simply about utilizing the correlation between the original features and the high-level semantic space. At the same time, due to the ”semantic gap” incurred by the extraction of the visual feature, the feature of text mode is often more distinctive than the visual features in terms of their distribution in the semantic space. Based on the above-mentioned problems, we have established a medium semantic enhancement space (MMSES) to enhance the discrimination capability of textual features. First of all, projecting the original features into the medium meaning semantic enhancement space by utilizing linear discriminant analysis, then projecting the features into a high meaning semantic space by utilizing fixed-distance projection, and finally, studying a different mapping matrix in response to different cross-media retrieval tasks during the retrieval process. In this paper, we employ traditional Euclidean distance to measure the similarity between different modalities in a common space. The effectiveness of MMSES was validated through extensive experiments on three datasets, Wikipedia, Pascal Sentence, and INRIA-Websearch.

[1]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[2]  Johanna Sepulveda,et al.  Fast content-based audio retrieval algorithm , 2013, Symposium of Signals, Images and Artificial Vision - 2013: STSIVA - 2013.

[3]  Alexander Kotov,et al.  Deep Neural Architecture for Multi-Modal Retrieval based on Joint Embedding Space for Text and Images , 2018, WSDM.

[4]  Ishwar K. Sethi,et al.  Multimedia content processing through cross-modal association , 2003, MULTIMEDIA '03.

[5]  Yang Yang,et al.  Adversarial Cross-Modal Retrieval , 2017, ACM Multimedia.

[6]  Hongbin Zha,et al.  Joint Latent Subspace Learning and Regression for Cross-Modal Retrieval , 2017, SIGIR.

[7]  Michael Isard,et al.  A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics , 2012, International Journal of Computer Vision.

[8]  Jeff A. Bilmes,et al.  Deep Canonical Correlation Analysis , 2013, ICML.

[9]  Bin Zhang,et al.  Semi-supervised modality-dependent cross-media retrieval , 2018, Multimedia Tools and Applications.

[10]  Yao Zhao,et al.  Modality-Dependent Cross-Media Retrieval , 2015, ACM Trans. Intell. Syst. Technol..

[11]  Tieniu Tan,et al.  Joint Feature Selection and Subspace Learning for Cross-Modal Retrieval , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Nicu Sebe,et al.  Content-based multimedia information retrieval: State of the art and challenges , 2006, TOMCCAP.

[13]  Wei Jiang,et al.  Latent topic model for audio retrieval , 2014, Pattern Recognit..

[14]  Gabriele Bavota,et al.  Automatic query reformulations for text retrieval in software engineering , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[15]  Fakhri Karray,et al.  An efficient concept-based retrieval model for enhancing text retrieval quality , 2013, ICUIMC '13.

[16]  Frédéric Jurie,et al.  Improving web image search results using query-relative classifiers , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Roman Rosipal,et al.  Overview and Recent Advances in Partial Least Squares , 2005, SLSFS.

[18]  Roger Levy,et al.  A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.

[19]  Xuelong Li,et al.  Learning Discriminative Binary Codes for Large-scale Cross-modal Retrieval , 2017, IEEE Transactions on Image Processing.

[20]  Yuxin Peng,et al.  Recursive Pyramid Network with Joint Attention for Cross-Media Retrieval , 2018, MMM.

[21]  Nicholas Ayache,et al.  Learning Semantic and Visual Similarity for Endomicroscopy Video Retrieval , 2012, IEEE Transactions on Medical Imaging.

[22]  Jing Li,et al.  Video hashing based on appearance and attention features fusion via DBN , 2016, Neurocomputing.

[23]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Zhiyong Peng,et al.  A Full-Text Retrieval Algorithm for Encrypted Data in Cloud Storage Applications , 2015, NLPCC.

[25]  Yi Yang,et al.  Semantic Pooling for Complex Event Analysis in Untrimmed Videos , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Cyrus Rashtchian,et al.  Collecting Image Annotations Using Amazon’s Mechanical Turk , 2010, Mturk@HLT-NAACL.

[27]  Huimin Lu,et al.  Deep adversarial metric learning for cross-modal retrieval , 2019, World Wide Web.

[28]  Quoc V. Le,et al.  Grounded Compositional Semantics for Finding and Describing Images with Sentences , 2014, TACL.

[29]  David W. Jacobs,et al.  Generalized Multiview Analysis: A discriminative latent space , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Yueting Zhuang,et al.  Learning Semantic Correlations for Cross-Media Retrieval , 2006, 2006 International Conference on Image Processing.

[31]  Yao Zhao,et al.  Learning a mid-level feature space for cross-media regularization , 2014, 2014 IEEE International Conference on Multimedia and Expo (ICME).

[32]  Yilong Yin,et al.  Comprehensive Feature-Based Robust Video Fingerprinting Using Tensor Model , 2016, IEEE Transactions on Multimedia.

[33]  Jinhui Tang,et al.  Semantic Neighbor Graph Hashing for Multimodal Retrieval , 2018, IEEE Transactions on Image Processing.

[34]  Kien A. Hua,et al.  Learning Label Preserving Binary Codes for Multimedia Retrieval , 2018, ACM Trans. Multim. Comput. Commun. Appl..

[35]  Yuxin Peng,et al.  Deep Cross-Media Knowledge Transfer , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.