Classify social image by integrating multi-modal content

There is a growing volume of social images with the development of social networks and digital cameras. Usually, these images are annotated with textual tags besides the visual content. It is quite urgent to automatically organize and manage this large number of social images. Image classification is the basic task of these applications and has attracted great research efforts. Though there are many researches on image classification, it is of considerable challenge to integrate the multi-modal content of social images simultaneously for classification, since the textual content and visual content are represented in two heterogeneous feature spaces. In this paper, we proposed a multi-modal learning method to integrate multi-modal features through their correlation seamlessly. Specifically, we learn two linear classification modules for the two types of features, and then they are integrated by the l2 normalization method via a joint model. Each classier is normalized with l2,1 to reduce the effect of the noisy features by selecting a subset of more important features. With the joint model, the classification based on visual features can be reinforced by the classification based on textual features, and vice verse. Then, the test image is classified based on both the textual features and visual features by combing the results of the two classifiers. Experiments conducted on real-world social image datasets demonstrate the superiority of our proposed method compared with the representative baselines.

[1]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[2]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[3]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[4]  Haoxiang Wang,et al.  An Effective Image Representation Method Using Kernel Classification , 2014, 2014 IEEE 26th International Conference on Tools with Artificial Intelligence.

[5]  Tao Mei,et al.  Online video recommendation based on multimodal fusion and relevance feedback , 2007, CIVR '07.

[6]  Sylvie Ranwez,et al.  Visualizing Social Photos on a Hasse Diagram for Eliciting Relations and Indexing New Photos , 2009, IEEE Transactions on Visualization and Computer Graphics.

[7]  Yan Liu,et al.  Discriminative deep belief networks for visual data classification , 2011, Pattern Recognit..

[8]  Ulf Brefeld,et al.  Co-EM support vector learning , 2004, ICML.

[9]  Hervé Glotin,et al.  Web image retrieval on ImagEVAL: evidences on visualness and textualness concept dependency in fusion model , 2007, CIVR '07.

[10]  Peter W. Eklund,et al.  An Intelligent User Interface for Browsing and Searching MPEG-7 Images Using Concept Lattices , 2006, CLA.

[11]  Fei Su,et al.  Efficient multi-modal hypergraph learning for social image classification with complex label correlations , 2016, Neurocomputing.

[12]  Eric O. Postma,et al.  Learning scale-variant and scale-invariant features for deep image classification , 2016, Pattern Recognit..

[13]  Jinhui Tang,et al.  Weakly Supervised Deep Metric Learning for Community-Contributed Image Retrieval , 2015, IEEE Transactions on Multimedia.

[14]  Tieniu Tan,et al.  l2, 1 Regularized correntropy for robust feature selection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Jing Liu,et al.  Robust Structured Subspace Learning for Data Representation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  John Shawe-Taylor,et al.  Two view learning: SVM-2K, Theory and Practice , 2005, NIPS.

[17]  Yi Gu,et al.  Optimizing top precision performance measure of content-based image retrieval by learning similarity function , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[18]  Jiwen Lu,et al.  PCANet: A Simple Deep Learning Baseline for Image Classification? , 2014, IEEE Transactions on Image Processing.

[19]  Rongrong Ji,et al.  Nonnegative Spectral Clustering with Discriminative Regularization , 2011, AAAI.

[20]  Thomas Gärtner,et al.  Efficient co-regularised least squares regression , 2006, ICML.

[21]  Mila Nikolova,et al.  Analysis of Half-Quadratic Minimization Methods for Signal and Image Recovery , 2005, SIAM J. Sci. Comput..

[22]  M. Kloft,et al.  l p -Norm Multiple Kernel Learning , 2011 .

[23]  Chiou-Shann Fuh,et al.  Dimensionality Reduction for Data in Multiple Feature Representations , 2008, NIPS.

[24]  Joe Tekli,et al.  Personalized Social Image Organization, Visualization, and Querying Tool Using Low- and High-Level Features , 2016, 2016 IEEE Intl Conference on Computational Science and Engineering (CSE) and IEEE Intl Conference on Embedded and Ubiquitous Computing (EUC) and 15th Intl Symposium on Distributed Computing and Applications for Business Engineering (DCABES).

[25]  Pierre Alliez,et al.  Convolutional Neural Networks for Large-Scale Remote-Sensing Image Classification , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[26]  Alexandre Bernardino,et al.  Matrix Completion for Weakly-Supervised Multi-Label Image Classification , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Emmanuel Dellandréa,et al.  Multimodal recognition of visual concepts using histograms of textual concepts and selective weighted late fusion scheme , 2013, Comput. Vis. Image Underst..

[28]  Yuxin Peng,et al.  The application of two-level attention models in deep convolutional neural network for fine-grained image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Gabriela Csurka,et al.  Semantic combination of textual and visual information in multimedia retrieval , 2011, ICMR.

[30]  Stephen Lin,et al.  Graph Embedding and Extensions: A General Framework for Dimensionality Reduction , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[32]  Zhiwu Lu,et al.  Direct Semantic Analysis for Social Image Classification , 2014, AAAI.

[33]  Xiaolong Wang,et al.  Convolutional Deep Networks for Visual Data Classification , 2012, Neural Processing Letters.

[34]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[35]  Zheng Chen,et al.  Latent semantic analysis for multiple-type interrelated data objects , 2006, SIGIR.

[36]  Alexander Zien,et al.  lp-Norm Multiple Kernel Learning , 2011, J. Mach. Learn. Res..

[37]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[38]  Rayid Ghani,et al.  Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[39]  Michal Wozniak,et al.  Some Remarks on Chosen Methods of Classifier Fusion Based on Weighted Voting , 2009, HAIS.

[40]  Dacheng Tao,et al.  Multi-View Intact Space Learning , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Benjamin B. Bederson,et al.  Semi-automatic photo annotation strategies using event based clustering and clothing based person recognition , 2007, Interact. Comput..

[42]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[43]  Pierre Alliez,et al.  Fully convolutional neural networks for remote sensing image classification , 2016, 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS).

[44]  Dapeng Tao,et al.  Local structure preserving discriminative projections for RGB-D sensor-based scene classification , 2015, Inf. Sci..

[45]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[46]  Yong Luo,et al.  Decomposition-Based Transfer Distance Metric Learning for Image Classification , 2014, IEEE Transactions on Image Processing.

[47]  Cordelia Schmid,et al.  Multimodal semi-supervised learning for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[48]  Dacheng Tao,et al.  Large-Margin Multi-ViewInformation Bottleneck , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[50]  Sébastien Ferré,et al.  Camelis: Organizing and Browsing a Personal Photo Collection with a Logical Information System , 2007, CLA.

[51]  Zhi-Hua Zhou,et al.  Semi-Supervised Regression with Co-Training , 2005, IJCAI.