A unified framework for multimodal retrieval

In this paper, a unified framework for multimodal content retrieval is presented. The proposed framework supports retrieval of rich media objects as unified sets of different modalities (image, audio, 3D, video and text) by efficiently combining all monomodal heterogeneous similarities to a global one according to an automatic weighting scheme. Then, a multimodal space is constructed to capture the semantic correlations among multiple modalities. In contrast to existing techniques, the proposed method is also able to handle external multimodal queries, by embedding them to the already constructed multimodal space, following a space mapping procedure of a submanifold analysis. In our experiments with five real multimodal datasets, we show the superiority of the proposed approach against competitive methods.

[1]  Colin Fyfe,et al.  Canonical correlation analysis using artificial neural networks , 1998, ESANN.

[2]  Hua Li,et al.  Mobile Search With Multimodal Queries , 2008, Proceedings of the IEEE.

[3]  Petros Daras,et al.  A 3D Shape Retrieval Framework Supporting Multimodal Queries , 2010, International Journal of Computer Vision.

[4]  Michael Rabadi,et al.  Kernel Methods for Machine Learning , 2015 .

[5]  Yueting Zhuang,et al.  Learning Semantic Correlations for Cross-Media Retrieval , 2006, 2006 International Conference on Image Processing.

[6]  Yiannis S. Boutalis,et al.  CEDD: Color and Edge Directivity Descriptor: A Compact Descriptor for Image Indexing and Retrieval , 2008, ICVS.

[7]  Liang-Tien Chia,et al.  Cross-media retrieval using query dependent search methods , 2010, Pattern Recognit..

[8]  Dietmar Saupe,et al.  3D Model Retrieval , 2001 .

[9]  Petros Daras,et al.  I-SEARCH: A Unified Framework for Multimodal Search and Retrieval , 2012, Future Internet Assembly.

[10]  Chang-Tsun Li,et al.  Trademark image retrieval using synthetic features for describing global shape and interior structure , 2009, Pattern Recognit..

[11]  François Pachet,et al.  A scale-free distribution of false positives for a large class of audio similarity measures , 2008, Pattern Recognit..

[12]  Hong Zhang,et al.  Measuring Multi-modality Similarities Via Subspace Learning for Cross-Media Retrieval , 2006, PCM.

[13]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[14]  M. Naderi Think globally... , 2004, HIV prevention plus!.

[15]  Yannis Manolopoulos,et al.  Nonlinear dimensionality reduction for efficient and effective audio similarity searching , 2009, Multimedia Tools and Applications.

[16]  Yiannis Kompatsiaris,et al.  Gradual transition detection using color coherence and other criteria in a video shot meta-segmentation framework , 2008, 2008 15th IEEE International Conference on Image Processing.

[17]  Hong Zhang,et al.  Multi-modal Correlation Modeling and Ranking for Retrieval , 2009, PCM.

[18]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[19]  Andreas Spanias,et al.  Segmentation, Indexing, and Retrieval for Environmental and Natural Sounds , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Lawrence K. Saul,et al.  Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifold , 2003, J. Mach. Learn. Res..

[22]  Vasumathi Narayanan,et al.  A Survey of Content-Based Video Retrieval , 2008 .

[23]  Yue Gao,et al.  3D model comparison using spatial structure circular descriptor , 2010, Pattern Recognit..

[24]  Petros Daras,et al.  Search and Retrieval of Rich Media Objects Supporting Multiple Multimodal Queries , 2012, IEEE Transactions on Multimedia.

[25]  Bin Wang,et al.  Manifold-ranking based retrieval using k-regular nearest neighbor graph , 2012, Pattern Recognit..

[26]  Ryutarou Ohbuchi,et al.  Unsupervised learning from a corpus for shape-based 3D model retrieval , 2006, MIR '06.

[27]  Shih-Fu Chang,et al.  Query-Adaptive Fusion for Multimodal Search , 2008, Proceedings of the IEEE.

[28]  Joshua B. Tenenbaum,et al.  Global Versus Local Methods in Nonlinear Dimensionality Reduction , 2002, NIPS.

[29]  Xiaohui Liu,et al.  Real-time traffic sign recognition from video by class-specific discriminative features , 2010, Pattern Recognit..

[30]  Michael G. Strintzis,et al.  3D object retrieval using the 3D shape impact descriptor , 2009, Pattern Recognit..

[31]  Claudio Gennaro,et al.  An Approach to Content-Based Image Retrieval Based on the Lucene Search Engine Library , 2010, ECDL.

[32]  Mukund Balasubramanian,et al.  The Isomap Algorithm and Topological Stability , 2002, Science.

[33]  Pepe Siy,et al.  Robust shape similarity retrieval based on contour segmentation polygonal multiresolution and elastic matching , 2005, Pattern Recognit..

[34]  Céline Loscos,et al.  3D Model Retrieval , 2013 .

[35]  Dewen Hu,et al.  Incremental Laplacian eigenmaps by preserving adjacent information between data points , 2009, Pattern Recognit. Lett..

[36]  Yi Yang,et al.  Ranking with local regression and global alignment for cross media retrieval , 2009, ACM Multimedia.

[37]  Xiangyang Wang,et al.  A robust digital audio watermarking based on statistics characteristics , 2009, Pattern Recognit..

[38]  Xin Yang,et al.  Mobile image search with multimodal context-aware queries , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[39]  Richard Bellman,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[40]  Ishwar K. Sethi,et al.  Multimedia content processing through cross-modal association , 2003, MULTIMEDIA '03.

[41]  Michael Johnston,et al.  Location grounding in multimodal local search , 2010, ICMI-MLMI '10.