Parallel field alignment for cross media retrieval

Cross media retrieval systems have received increasing interest in recent years. Due to the semantic gap between low-level features and high-level semantic concepts of multimedia data, many researchers have explored joint-model techniques in cross media retrieval systems. Previous joint-model approaches usually focus on two traditional ways to design cross media retrieval systems: (a) fusing features from different media data; (b) learning different models for different media data and fusing their outputs. However, the process of fusing features or outputs will lose both low- and high-level abstraction information of media data. Hence, both ways do not really reveal the semantic correlations among the heterogeneous multimedia data. In this paper, we introduce a novel method for the cross media retrieval task, named Parallel Field Alignment Retrieval (PFAR), which integrates a manifold alignment framework from the perspective of vector fields. Instead of fusing original features or outputs, we consider the cross media retrieval as a manifold alignment problem using parallel fields. The proposed manifold alignment algorithm can effectively preserve the metric of data manifolds, model heterogeneous media data and project their relationship into intermediate latent semantic spaces during the process of manifold alignment. After the alignment, the semantic correlations are also determined. In this way, the cross media retrieval task can be resolved by the determined semantic correlations. Comprehensive experimental results have demonstrated the effectiveness of our approach.

[1]  Sridhar Mahadevan,et al.  Manifold alignment using Procrustes analysis , 2008, ICML '08.

[2]  Thijs Westerveld Probabilistic multimedia retrieval , 2002, SIGIR '02.

[3]  Nuno Vasconcelos,et al.  Bridging the Gap: Query by Semantic Example , 2007, IEEE Transactions on Multimedia.

[4]  Yi Yang,et al.  Harmonizing Hierarchical Manifolds for Multimedia Document Semantics Understanding and Cross-Media Retrieval , 2008, IEEE Transactions on Multimedia.

[5]  Daniel D. Lee,et al.  Semisupervised alignment of manifolds , 2005, AISTATS.

[6]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[7]  Joo-Hwee Lim,et al.  Latent semantic fusion model for image retrieval and annotation , 2007, CIKM '07.

[8]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[9]  Chang Wang,et al.  A General Framework for Manifold Alignment , 2009, AAAI Fall Symposium: Manifold Learning and Its Applications.

[10]  Xiaofei He,et al.  Parallel vector field embedding , 2013, J. Mach. Learn. Res..

[11]  Roger Levy,et al.  A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.

[12]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[13]  Daniel D. Lee,et al.  Learning High Dimensional Correspondences from Low Dimensional Manifolds , 2003 .

[14]  Larry A. Wasserman,et al.  Statistical Analysis of Semi-Supervised Regression , 2007, NIPS.

[15]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[16]  Mark Sanderson,et al.  Diversity in Photo Retrieval: Overview of the ImageCLEFPhoto Task 2009 , 2009, CLEF.

[17]  Nathan Srebro,et al.  Statistical Analysis of Semi-Supervised Learning: The Limit of Infinite Unlabelled Data , 2009, NIPS.

[18]  Xiaofei He,et al.  Semi-supervised Regression via Parallel Field Regularization , 2011, NIPS.

[19]  Nuno Vasconcelos,et al.  Maximum Covariance Unfolding : Manifold Learning for Bimodal Data , 2011, NIPS.

[20]  Hugo Jair Escalante,et al.  Late fusion of heterogeneous methods for multimedia image retrieval , 2008, MIR '08.

[21]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[22]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[23]  Yi Yang,et al.  Ranking with local regression and global alignment for cross media retrieval , 2009, ACM Multimedia.

[24]  Wei-Ying Ma,et al.  Learning an image manifold for retrieval , 2004, MULTIMEDIA '04.

[25]  Yi Yang,et al.  Mining Semantic Correlation of Heterogeneous Multimedia Data for Cross-Media Retrieval , 2008, IEEE Transactions on Multimedia.

[26]  Marc Leman,et al.  Content-Based Music Information Retrieval: Current Directions and Future Challenges , 2008, Proceedings of the IEEE.

[27]  Marcel Worring,et al.  Multimodal Video Indexing : A Review of the State-ofthe-art , 2001 .

[28]  Derek Hoiem,et al.  Building text features for object image classification , 2009, CVPR.

[29]  Ebroul Izquierdo,et al.  Combining image captions and visual analysis for image concept classification , 2008, MDM '08.

[30]  Gene H. Golub,et al.  Matrix computations , 1983 .

[31]  I. Jolliffe Principal Component Analysis , 2002 .

[32]  N. Nikolaidis,et al.  Video shot detection and condensed representation. a review , 2006, IEEE Signal Processing Magazine.

[33]  I. Holopainen Riemannian Geometry , 1927, Nature.

[34]  Theodora Tsikrika,et al.  Overview of the WikipediaMM Task at ImageCLEF 2009 , 2009, CLEF.

[35]  Chang Wang,et al.  Heterogeneous Domain Adaptation Using Manifold Alignment , 2011, IJCAI.

[36]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).