Multiple deep CNN for image annotation

Achieving better performance has always been an important research target in the field of automatic image annotation. This paper draws on the current popular deep learning model for the field of automatic image annotation. We propose a multiple convolutional neural networks (CNN) combination model for image annotation, which achieves satisfactory performance. First of all, we use three classical convolutional neural networks, and subsequently we examine the annotation accuracy for each CNN model. Then we take full advantage of the powerful feature representation capabilities of deep CNN, thus the last two layers of the deep CNN are extracted for each model and merged to form a new combined feature. Finally, we form our combination models by concatenating these features from each CNN model, and utilize these concatenated features to linear SVM classifier for image annotation. Experimental results on ImageCLEF2012 image annotation dataset illustrate that our combination method outperforms the traditional classifiers and the individual CNN models.

[1]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[2]  Bart Thomee,et al.  Overview of the ImageCLEF 2012 Flickr Photo Annotation and Retrieval Task , 2012, CLEF.

[3]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[4]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[5]  Victor Lavrenko,et al.  Sparse Kernel Learning for Image Annotation , 2014, ICMR.

[6]  Jean-Marc Odobez,et al.  A Thousand Words in a Scene , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[8]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[10]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Trevor Darrell,et al.  Beyond spatial pyramids: Receptive field learning for pooled image features , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Wei Wu,et al.  Nearest Neighbor with Multi-feature Metric for Image Annotation , 2015, ICONIP.

[13]  Hichem Sahbi,et al.  Nonlinear Deep Kernel Learning for Image Annotation , 2017, IEEE Transactions on Image Processing.

[14]  Zahid Mehmood,et al.  Content-based image retrieval and semantic automatic image annotation based on the weighted average of triangular histograms using support vector machine , 2017, Applied Intelligence.

[15]  Alberto Del Bimbo,et al.  Automatic image annotation via label transfer in the semantic space , 2016, Pattern Recognit..