Large scale automatic image annotation based on convolutional neural network

Abstract Automatic image annotation is one of the most important challenges in computer vision, which is critical to many real-world researches and applications. In this paper, we focus on the issue of large scale image annotation with deep learning. Firstly, considering the existing image data, especially the network images, most of the labels of themselves are inaccurate or imprecise. We propose a Multitask Voting (MV) method, which can improve the accuracy of original annotation to a certain extent, thereby enhancing the training effect of the model. Secondly, the MV method can also achieve the adaptive label, whereas most existing methods pre-specify the number of tags to be selected. Additionally, based on convolutional neural network, a large scale image annotation model MVAIACNN is constructed. Finally, we evaluate the performance with experiments on the MIRFlickr25K and NUS-WIDE datasets, and compare with other methods, demonstrating the effectiveness of the MVAIACNN.

[1]  Gang Wang,et al.  Deep Learning-Based Classification of Hyperspectral Data , 2014, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[2]  Yeong-Yuh Xu,et al.  Multiple-instance learning based decision neural networks for image retrieval and classification , 2016, Neurocomputing.

[3]  Raphaël Marée,et al.  Random subwindows for robust image classification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[4]  Hagai Attias,et al.  Topic regression multi-modal Latent Dirichlet Allocation for image annotation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  C. V. Jawahar,et al.  Image Annotation by Propagating Labels from Semantic Neighbourhoods , 2016, International Journal of Computer Vision.

[6]  Yangqing Jia,et al.  Deep Convolutional Ranking for Multilabel Image Annotation , 2013, ICLR.

[7]  Zhou Guo,et al.  On combining multiscale deep learning features for the classification of hyperspectral remote sensing imagery , 2015 .

[8]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[9]  Hossein Nezamabadi-pour,et al.  A multi-expert based framework for automatic image annotation , 2017, Pattern Recognit..

[10]  Klaus H. Maier-Hein,et al.  Crowd-Algorithm Collaboration for Large-Scale Endoscopic Image Annotation with Confidence , 2016, MICCAI.

[11]  Alberto Del Bimbo,et al.  Automatic image annotation via label transfer in the semantic space , 2016, Pattern Recognit..

[12]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[13]  Xiaogang Wang,et al.  Deep Learning Face Representation from Predicting 10,000 Classes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Raphaël Marée,et al.  Random Subwindows and Randomized Trees for Image Retrieval, Classification, and Annotation , 2007 .

[15]  Beng Chin Ooi,et al.  Effective Multi-Modal Retrieval based on Stacked Auto-Encoders , 2014, Proc. VLDB Endow..

[16]  Xiaogang Wang,et al.  Deep Learning Face Representation by Joint Identification-Verification , 2014, NIPS.

[17]  David R. Bull,et al.  Robust texture features based on undecimated dual-tree complex wavelets and local magnitude binary patterns , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[18]  Chin-Hui Lee,et al.  An Adaptive Image Content Representation and Segmentation Approach to Automatic Image Annotation , 2004, CIVR.

[19]  Shanjun Mao,et al.  Spectral–spatial classification of hyperspectral images using deep convolutional neural networks , 2015 .

[20]  Michael Isard,et al.  A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics , 2012, International Journal of Computer Vision.

[21]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[22]  Jiajun Wu,et al.  Deep multiple instance learning for image classification and auto-annotation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[24]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[25]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[26]  Jian Wang,et al.  Large Scale Image Annotation via Deep Representation Learning and Tag Embedding Learning , 2015, ICMR.

[27]  Xiaogang Wang,et al.  Deeply learned face representations are sparse, selective, and robust , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Bart Thomee,et al.  New trends and ideas in visual concept detection: the MIR flickr retrieval evaluation initiative , 2010, MIR '10.

[29]  Nitish Srivastava,et al.  Learning Representations for Multimodal Data with Deep Belief Nets , 2012 .