Manifold Learning Based Cross-media Retrieval: A Solution to Media Object Complementary Nature

Media objects of different modalities always exist jointly and they are naturally complementary of each other, either in the view of semantics or in the view of modality. In this paper, we propose a manifold learning based cross-media retrieval approach that gives solutions to the two intrinsically basic but crucial questions of media objects semantics understanding and cross-media retrieval. First, considering the semantic complementary, how can we represent the concurrent media objects and fuse the complementary information they carry to understand the integrated semantics precisely. Second, considering the modality complementary, how can we accomplish the modality bridge to establish the cross-index and facilitate the cross-media retrieval? To solve the two problems, we first construct a Multimedia Document (MMD) Semi-Semantic Graph (MMDSSG) and then adopt Multidimensional Scaling to create an MMD Semantic Space (MMDSS). Both long-term and short-term feedbacks are proposed to boost the system performance. The first one is used to refine the MMDSSG and the second one is adopted to introduce new items that are not in the training set into the MMDSS. Since all of the MMDs and their component media objects of different modalities lie in the MMDSS and they are indexed uniformly by their coordinates in the MMDSS regardless of their modalities, the semantic subspace is actually a bridge of media objects which are of different modalities and the cross-media retrieval can be easily achieved. Experiment results are encouraging and indicate that the proposed approach is effective.

[1]  Heeyoul Choi,et al.  Kernel Isomap on Noisy Manifold , 2005, Proceedings. The 4nd International Conference on Development and Learning, 2005..

[2]  Tido Röder,et al.  Efficient content-based retrieval of motion capture data , 2005, SIGGRAPH 2005.

[3]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[4]  Douglas Keislar,et al.  Content-Based Classification, Search, and Retrieval of Audio , 1996, IEEE Multim..

[5]  Yi Yang,et al.  Understanding Multimedia Document Semantics for Cross-Media Retrieval , 2005, PCM.

[6]  James Ze Wang,et al.  Content-based image indexing and searching using Daubechies' wavelets , 1998, International Journal on Digital Libraries.

[7]  Mohan S. Kankanhalli,et al.  Content-based music structure analysis with applications to music semantics understanding , 2004, MULTIMEDIA '04.

[8]  H. Sebastian Seung,et al.  The Manifold Ways of Perception , 2000, Science.

[9]  Zhu Liu,et al.  Multimedia content analysis-using both audio and visual clues , 2000, IEEE Signal Process. Mag..

[10]  Chih-Yi Chiu,et al.  Content-based retrieval for human motion data , 2004, J. Vis. Commun. Image Represent..

[11]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[12]  Stephen W. Smoliar,et al.  Content based video indexing and retrieval , 1994, IEEE MultiMedia.

[13]  Yueting Zhuang,et al.  Learning Semantic Correlations for Cross-Media Retrieval , 2006, 2006 International Conference on Image Processing.

[14]  HongJiang Zhang,et al.  Scheme for visual feature-based image indexing , 1995, Electronic Imaging.

[15]  Guodong Guo,et al.  Content-based audio classification and retrieval by support vector machines , 2003, IEEE Trans. Neural Networks.

[16]  Wei-Ying Ma,et al.  Learning an image manifold for retrieval , 2004, MULTIMEDIA '04.

[17]  Yueting Zhuang,et al.  Search for Multi-modality Data in Digital Libraries , 2001, IEEE Pacific Rim Conference on Multimedia.

[18]  Edward Y. Chang,et al.  CBSA: content-based soft annotation for multimodal image retrieval using Bayes point machines , 2003, IEEE Trans. Circuits Syst. Video Technol..

[19]  Jianping Fan,et al.  ClassView: hierarchical video shot classification, indexing, and accessing , 2004, IEEE Transactions on Multimedia.