Image-Text Dual Model for Small-Sample Image Classification

Small-sample classification is a challenging problem in computer vision and has many applications. In this paper, we propose an image-text dual model to improve the classification performance on small-sample dataset. The proposed dual model consists of two sub-models, an image classification model and a text classification model. After training the sub-models respectively, we design a novel method to fuse the two sub-models rather than simply combining the two models’ results. Our image-text dual model aims to utilize the text information to overcome the problem of training deep models on small-sample datasets. To demonstrate the effectiveness of the proposed dual model, we conduct extensive experiments on LabelMe and UIUC-Sports. Experimental results show that our model is superior to other models. In conclusion, our proposed model can achieve the highest image classification accuracy among all the referred models on LabelMe and UIUC-Sports.

[1]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[2]  Jonathon S. Hare,et al.  Automatically annotating the MIR Flickr dataset: experimental protocols, openly available data and semantic spaces , 2010, MIR '10.

[3]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[4]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[5]  Li Xiao-x Multi-view Supervised Latent Dirichlet Allocation , 2014 .

[6]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[7]  Hugo Larochelle,et al.  Topic Modeling of Multimodal Data: An Autoregressive Approach , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Hagai Attias,et al.  Topic regression multi-modal Latent Dirichlet Allocation for image annotation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[10]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Indexing , 1999, SIGIR Forum.

[11]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[12]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[13]  Hugo Larochelle,et al.  A Deep and Autoregressive Approach for Topic Modeling of Multimodal Data , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[15]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[16]  Hugo Larochelle,et al.  A Neural Autoregressive Topic Model , 2012, NIPS.

[17]  Fei-Fei Li,et al.  What, where and who? Classifying events by scene and object recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[18]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.