Multimedia Feature Mapping and Correlation Learning for Cross-Modal Retrieval

ThisarticledescribeshowwiththerapidincreasingofmultimediacontentontheInternet,theneed foreffectivecross-modalretrievalhasattractedmuchattentionrecently.Manyrelatedworksignore thelatentsemanticcorrelationsofmodalitiesinthenon-linearspaceandtheextractionofhigh-level modalityfeatures,whichonlyfocusesonthesemanticmappingofmodalitiesinlinearspaceand theuseoflow-levelartificialfeaturesasmodalityfeaturerepresentation.Tosolvetheseissues,the authorsfirstutilizesconvolutionalneuralnetworksandtopicmodaltoobtainahigh-levelsemantic featureofvariousmodalities.Sequentially,theyproposeasupervisedlearningalgorithmbasedona kernelwithpartialleastsquaresthatcancapturesemanticcorrelationsacrossmodalities.Finally,the jointmodelofdifferentmodalitiesislearntbythetrainingset.Extensiveexperimentsareconducted onthreebenchmarkdatasetsthatincludeWikipedia,PascalandMIRFlickr.Theresultsshowthat theproposedapproachachievesbetterretrievalperformanceoverseveralstate-of-the-artapproaches. KeywoRdS Correlation Learning, Cross-Modal Retrieval, Multimedia, Semantic Feature, Text and Image

[1]  Nuno Vasconcelos,et al.  Bridging the Gap: Query by Semantic Example , 2007, IEEE Transactions on Multimedia.

[2]  Laurence T. Yang,et al.  PPHOCFS: Privacy Preserving High-Order CFS Algorithm on the Cloud for Clustering Multimedia Data , 2016, ACM Trans. Multim. Comput. Commun. Appl..

[3]  Wei Wang,et al.  Learning Coupled Feature Spaces for Cross-Modal Matching , 2013, 2013 IEEE International Conference on Computer Vision.

[4]  Ching-Hsien Hsu,et al.  Enhanced User Context-Aware Reputation Measurement of Multimedia Service , 2016, ACM Trans. Multim. Comput. Commun. Appl..

[5]  Nitish Srivastava,et al.  Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[6]  Xiaohua Zhai,et al.  Cross-modality correlation propagation for cross-media retrieval , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  David W. Jacobs,et al.  Generalized Multiview Analysis: A discriminative latent space , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Roman Rosipal,et al.  Kernel Partial Least Squares Regression in Reproducing Kernel Hilbert Space , 2002, J. Mach. Learn. Res..

[9]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[10]  Yueting Zhuang,et al.  Supervised Coupled Dictionary Learning with Group Structures for Multi-modal Retrieval , 2013, AAAI.

[11]  Christoph Meinel,et al.  Deep Semantic Mapping for Cross-Modal Retrieval , 2015, 2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI).

[12]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[13]  Wei Liu,et al.  Discriminative Dictionary Learning With Common Label Alignment for Cross-Modal Retrieval , 2016, IEEE Transactions on Multimedia.

[14]  Wei Wang,et al.  Continuum regression for cross-modal multimedia retrieval , 2012, 2012 19th IEEE International Conference on Image Processing.

[15]  Xing Xu,et al.  Coupled dictionary learning and feature mapping for cross-modal retrieval , 2015, 2015 IEEE International Conference on Multimedia and Expo (ICME).

[16]  Beng Chin Ooi,et al.  Effective deep learning-based multi-modal retrieval , 2015, The VLDB Journal.

[17]  Roger Levy,et al.  A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.

[18]  Laurence T. Yang,et al.  An Incremental CFS Algorithm for Clustering Large Data in Industrial Internet of Things , 2017, IEEE Transactions on Industrial Informatics.

[19]  Xinbo Gao,et al.  Discriminative Latent Feature Space Learning for Cross-Modal Retrieval , 2015, ICMR.

[20]  D. Jacobs,et al.  Bypassing synthesis: PLS for face recognition with pose, low-resolution and sketch , 2011, CVPR 2011.