Multi-modal learning for social image classification

There is growing interest in social image classification because of its importance in web-based image application. Though there are many approaches on image classification, it is a great problem to integrate multi-modal content of social images simultaneously for social image classification, since the textual content and visual content are represented in two heterogeneous feature spaces. In this study, we proposed a multi-modal learning algorithm to fuse the multiple features through their correlation seamlessly. Specifically, we learn two linear classification modules for the two types of feature, and then they are integrated by the l2 normalization via a joint model. With the joint model, the classification based on visual feature can be reinforced by the classification based on textual feature, and vice verse. Then, the test image can be classified based on both the textual feature and visual feature by combing the results of the two classifiers. The evaluate the approach, we conduct some experiments on real-world datasets, and the result shows the superiority of our proposed algorithm against the baselines.

[1]  Alexander Zien,et al.  lp-Norm Multiple Kernel Learning , 2011, J. Mach. Learn. Res..

[2]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[3]  Lawrence Carin,et al.  Semi-Supervised Classification , 2004, Encyclopedia of Database Systems.

[4]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[5]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[6]  Tao Mei,et al.  Online video recommendation based on multimodal fusion and relevance feedback , 2007, CIVR '07.

[7]  Ulf Brefeld,et al.  Co-EM support vector learning , 2004, ICML.

[8]  Hervé Glotin,et al.  Web image retrieval on ImagEVAL: evidences on visualness and textualness concept dependency in fusion model , 2007, CIVR '07.

[9]  Fei Su,et al.  Efficient multi-modal hypergraph learning for social image classification with complex label correlations , 2016, Neurocomputing.

[10]  Chiou-Shann Fuh,et al.  Dimensionality Reduction for Data in Multiple Feature Representations , 2008, NIPS.

[11]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[12]  Tieniu Tan,et al.  l2, 1 Regularized correntropy for robust feature selection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Dapeng Tao,et al.  Local structure preserving discriminative projections for RGB-D sensor-based scene classification , 2015, Inf. Sci..

[14]  John Shawe-Taylor,et al.  Two view learning: SVM-2K, Theory and Practice , 2005, NIPS.

[15]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[16]  Emmanuel Dellandréa,et al.  Multimodal recognition of visual concepts using histograms of textual concepts and selective weighted late fusion scheme , 2013, Comput. Vis. Image Underst..

[17]  Michal Wozniak,et al.  Some Remarks on Chosen Methods of Classifier Fusion Based on Weighted Voting , 2009, HAIS.

[18]  Rongrong Ji,et al.  Nonnegative Spectral Clustering with Discriminative Regularization , 2011, AAAI.

[19]  Dacheng Tao,et al.  Multi-View Intact Space Learning , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Mila Nikolova,et al.  Analysis of Half-Quadratic Minimization Methods for Signal and Image Recovery , 2005, SIAM J. Sci. Comput..

[21]  M. Kloft,et al.  l p -Norm Multiple Kernel Learning , 2011 .

[22]  Zhiwu Lu,et al.  Direct Semantic Analysis for Social Image Classification , 2014, AAAI.

[23]  Gabriela Csurka,et al.  Semantic combination of textual and visual information in multimedia retrieval , 2011, ICMR.

[24]  Cordelia Schmid,et al.  Multimodal semi-supervised learning for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[25]  Dacheng Tao,et al.  Large-Margin Multi-ViewInformation Bottleneck , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[27]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[28]  Zheng Chen,et al.  Latent semantic analysis for multiple-type interrelated data objects , 2006, SIGIR.

[29]  Rayid Ghani,et al.  Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[30]  Yong Luo,et al.  Decomposition-Based Transfer Distance Metric Learning for Image Classification , 2014, IEEE Transactions on Image Processing.