A novel framework for retrieval and interactive visualization of multimodal data

With the abundance of multimedia in web databases and the increasing user need for content of many modalities, such as images, sounds, etc., new methods for retrieval and visualization of multimodal media are required. In this paper, novel techniques for retrieval and visualization of multimodal data, i.e. documents consisting of many modalities, are proposed. A novel cross-modal retrieval framework is presented, in which the results of several unimodal retrieval systems are fused into a single multimodal list by the introduction of a cross-modal distance. For the presentation of the retrieved results, a multimodal visualization framework is also proposed, which extends existing unimodal similarity-based visualization methods for multimodal data. The similarity measure between two multimodal objects is defined as the weighted sum of unimodal similarities, with the weights determined via an interactive user feedback scheme. Experimental results show that the cross-modal framework outperforms unimodal and other multimodal approaches while the visualization framework enhances existing visualization methods by efficiently exploiting multimodality and user feedback.

[1]  Michael C. Hout,et al.  Multidimensional Scaling , 2003, Encyclopedic Dictionary of Archaeology.

[2]  Petros Daras,et al.  Multimodal search and retrieval using manifold learning and query formulation , 2011, Web3D '11.

[3]  Michael G. Strintzis,et al.  3D Content-Based Search Based on 3D Krawtchouk Moments , 2006, Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT'06).

[4]  Yueting Zhuang,et al.  Learning Semantic Correlations for Cross-Media Retrieval , 2006, 2006 International Conference on Image Processing.

[5]  Andreas Nürnberger,et al.  Weighted Self-Organizing Maps: Incorporating User Feedback , 2003, ICANN.

[6]  Michael G. Strintzis,et al.  Efficient 3-D model search and retrieval using generalized 3-D radon transforms , 2006, IEEE Transactions on Multimedia.

[7]  Edward Y. Chang,et al.  Optimal multimodal fusion for multimedia data analysis , 2004, MULTIMEDIA '04.

[8]  Dimitrios Tzovaras,et al.  A Novel Framework for Multimodal Retrieval and Visualization of Multimedia Data , 2012 .

[9]  Charles T. Zahn,et al.  Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters , 1971, IEEE Transactions on Computers.

[10]  Szymon Rusinkiewicz,et al.  Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors , 2003, Symposium on Geometry Processing.

[11]  Perfecto Herrera,et al.  Algorithms, Human Factors , 2022 .

[12]  Hong Zhang,et al.  Measuring Multi-modality Similarities Via Subspace Learning for Cross-Media Retrieval , 2006, PCM.

[13]  Marcel Worring,et al.  Concept-Based Video Retrieval , 2009, Found. Trends Inf. Retr..

[14]  Chiou-Shann Fuh,et al.  Multimodal kernel learning for image retrieval , 2010, 2010 International Conference on System Science and Engineering.

[15]  Raveendran Paramesran,et al.  Image analysis by Krawtchouk moments , 2003, IEEE Trans. Image Process..

[16]  Desney S. Tan,et al.  CueFlik: interactive concept learning in image search , 2008, CHI.

[17]  Hakan Cevikalp,et al.  Semi-Supervised Dimensionality Reduction Using Pairwise Equivalence Constraints , 2008, VISAPP.

[18]  Guojun Lu,et al.  Generic Fourier descriptor for shape-based image retrieval , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[19]  Panu Somervuo,et al.  How to make large self-organizing maps for nonvectorial data , 2002, Neural Networks.

[20]  Edward M. Reingold,et al.  Graph drawing by force‐directed placement , 1991, Softw. Pract. Exp..

[21]  Raymond J. Mooney,et al.  Integrating constraints and metric learning in semi-supervised clustering , 2004, ICML.

[22]  Yi Yang,et al.  Ranking with local regression and global alignment for cross media retrieval , 2009, ACM Multimedia.

[23]  Chiou-Shann Fuh,et al.  Multiple Kernel Learning for Dimensionality Reduction , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Teuvo Kohonen,et al.  The self-organizing map , 1990, Neurocomputing.

[25]  Gert R. G. Lanckriet,et al.  Learning Multi-modal Similarity , 2010, J. Mach. Learn. Res..