Accumulated reconstruction error vector (AREV): a semantic representation for cross-media retrieval

Cross-media retrieval aims to automatically perform the content-based search procedure among various media types (e.g., image, video and text), in which media representation plays an important role for providing the heterogeneous similarity measure. In this work, a novel semantic representation of cross-media, called accumulated reconstruction error vector (AREV), is proposed, which includes category-specific dictionary learning, media sample reconstruction, and accumulative reconstruction error concatenation. Instead of directly learning the correlation relationship among heterogeneous items in the same semantic groups, the AREV projects individually their original feature descriptions into a shared semantic space, in which each component is semantic consistent for various media types due to the consistency in category information. Experiments on the commonly used datasets, i.e. Wikipedia dataset and NUS-Wide dataset, show the good performance in terms of effectiveness and efficiency.

[1]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[2]  Yi Yang,et al.  Ranking with local regression and global alignment for cross media retrieval , 2009, ACM Multimedia.

[3]  Yi Yang,et al.  Image Attribute Adaptation , 2014, IEEE Transactions on Multimedia.

[4]  Deepu Rajan,et al.  Multi-modal fusion for associated news story retrieval , 2013, Multimedia Tools and Applications.

[5]  Guodong Guo,et al.  Content-based audio classification and retrieval by support vector machines , 2003, IEEE Trans. Neural Networks.

[6]  Guizhong Liu,et al.  A new ROI based image retrieval system using an auxiliary Gaussian weighting scheme , 2012, Multimedia Tools and Applications.

[7]  Francesco G. B. De Natale,et al.  A Stochastic Approach to Image Retrieval Using Relevance Feedback and Particle Swarm Optimization , 2010, IEEE Transactions on Multimedia.

[8]  Goujun Lu,et al.  Indexing and Retrieval of Audio: A Survey , 2001, Multimedia Tools and Applications.

[9]  Yao Zhao,et al.  Frame Fusion for Video Copy Detection , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[10]  Nicu Sebe,et al.  Content-based multimedia information retrieval: State of the art and challenges , 2006, TOMCCAP.

[11]  David A. Ross,et al.  Survey and Evaluation of Audio Fingerprinting Schemes for Mobile Query-by-Example Applications , 2011, ISMIR.

[12]  Petros Daras,et al.  Search and Retrieval of Rich Media Objects Supporting Multiple Multimodal Queries , 2012, IEEE Transactions on Multimedia.

[13]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[14]  Qi Tian,et al.  Nearest-neighbor method using multiple neighborhood similarities for social media data mining , 2012, Neurocomputing.

[15]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[16]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[17]  James Allan,et al.  Approaches to passage retrieval in full text information systems , 1993, SIGIR.

[18]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[19]  Jian Pei,et al.  Parallel field alignment for cross media retrieval , 2013, ACM Multimedia.

[20]  Qi Tian,et al.  Multi-feature metric learning with knowledge transfer among semantics and social tagging , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[22]  YangYi,et al.  Harmonizing Hierarchical Manifolds for Multimedia Document Semantics Understanding and Cross-Media Retrieval , 2008 .

[23]  Yao Zhao,et al.  Joint Optimization Toward Effective and Efficient Image Search , 2013, IEEE Transactions on Cybernetics.

[24]  Nicu Sebe,et al.  Feature Weighting via Optimal Thresholding for Video Analysis , 2013, 2013 IEEE International Conference on Computer Vision.

[25]  Yi Yang,et al.  A Multimedia Retrieval Framework Based on Semi-Supervised Ranking and Relevance Feedback , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[27]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[28]  Xiaohua Zhai,et al.  Tri-space and ranking based heterogeneous similarity measure for cross-media retrieval , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[29]  Roger Levy,et al.  A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.

[30]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[31]  Qi Tian,et al.  ${\rm S}^{3}{\rm MKL}$: Scalable Semi-Supervised Multiple Kernel Learning for Real-World Image Applications , 2012, IEEE Transactions on Multimedia.

[32]  Yao Zhao,et al.  Multimodal Fusion for Video Search Reranking , 2010, IEEE Transactions on Knowledge and Data Engineering.