A novel image annotation model based on content representation with multi-layer segmentation

Abstract Image automatic annotation is an important issue of semantic-based image retrieval, and it is still a challenging problem for the reason of semantic gap. In this paper, a novel model with three parts is proposed. The first one is multi-layer image segmentation, in which saliency analysis and normalized cut are combined to segment images into semantic regions in the first layer. While in the second layer, the semantic regions are segmented into grids further . The second one is image content representation by region-based bag-of-words (RBoW) model, which is the variant of BoW model. Considering the correlations of labels, we adopt second-order CRFs as the third part of our model to ensure the accuracy of automatic image annotation. Experimental results show that our multi-layer segmentation-based image annotation model can achieve promising performance for multi-labeling and outperform the model based on single-layer segmentation and previous algorithm on Corel 5K and Pascal VOC 2007 datasets .

[1]  Cordelia Schmid,et al.  TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[2]  Jianbo Shi,et al.  Spectral segmentation with multiscale graph decomposition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[3]  Xiaojun Qi,et al.  Incorporating multiple SVMs for automatic image annotation , 2007, Pattern Recognit..

[4]  Gang Hua,et al.  Descriptive visual words and visual phrases for image applications , 2009, ACM Multimedia.

[5]  Edward Y. Chang,et al.  Using one-class and two-class SVMs for multiclass image annotation , 2005, IEEE Transactions on Knowledge and Data Engineering.

[6]  Jing Zhang,et al.  Representation of image content based on RoI-BoW , 2015, J. Vis. Commun. Image Represent..

[7]  Yong Wang,et al.  Combining global, regional and contextual features for automatic image annotation , 2009, Pattern Recognit..

[8]  Chin-Hui Lee,et al.  An Adaptive Image Content Representation and Segmentation Approach to Automatic Image Annotation , 2004, CIVR.

[9]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Kin-Man Lam,et al.  An efficient two-stage framework for image annotation , 2013, Pattern Recognit..

[11]  Jae Won Lee,et al.  Content-based image classification using a neural network , 2004, Pattern Recognit. Lett..

[12]  Fangyuan Wang,et al.  Image retrieval using multiple orders of Geometry-preserving Visual Phrases , 2012, 2012 International Conference on Image Analysis and Signal Processing.

[13]  Jing Liu,et al.  Image annotation via graph learning , 2009, Pattern Recognit..

[14]  Paria Mehrani,et al.  Superpixels and Supervoxels in an Energy Optimization Framework , 2010, ECCV.

[15]  Jiebo Luo,et al.  A computationally efficient approach to indoor/outdoor scene classification , 2002, Object recognition supported by user interaction for service robots.

[16]  Andrew Zisserman,et al.  A Statistical Approach to Texture Classification from Single Images , 2004, International Journal of Computer Vision.

[17]  Tsuhan Chen,et al.  Image retrieval with geometry-preserving visual phrases , 2011, CVPR 2011.

[18]  Tao Chen,et al.  From universal bag-of-words to adaptive bag-of-phrases for mobile scene recognition , 2011, 2011 18th IEEE International Conference on Image Processing.

[19]  Shuicheng Yan,et al.  Hidden-Concept Driven Multilabel Image Annotation and Label Ranking , 2012, IEEE Transactions on Multimedia.

[20]  María Vanrell,et al.  Texton theory revisited: A bag-of-words approach to combine textons , 2012, Pattern Recognit..

[21]  Jing Zhang,et al.  Representation of image content with multi-scale segmentation , 2013, 2013 International Conference on Machine Learning and Cybernetics.

[22]  Rami Albatal,et al.  A new ROI grouping schema for automatic image annotation , 2011, 2011 IEEE International Conference on Multimedia and Expo.

[23]  Rami Albatal,et al.  Visual Phrases for automatic images annotation , 2010, 2010 International Workshop on Content Based Multimedia Indexing (CBMI).

[24]  In-So Kweon,et al.  A semantic region descriptor for local feature based image categorization , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[25]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[26]  Nanning Zheng,et al.  Learning to Detect a Salient Object , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Chong-Wah Ngo,et al.  Near-duplicate keyframe retrieval with visual keywords and semantic context , 2007, CIVR '07.

[28]  Jitendra Malik,et al.  SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[29]  Mihai Datcu,et al.  A Semi-Supervised Algorithm for Auto-Annotation and Unknown Structures Discovery in Satellite Image Databases , 2010, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[30]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[31]  Fabio Del Frate,et al.  Use of Neural Networks for Automatic Classification From High-Resolution Images , 2007, IEEE Transactions on Geoscience and Remote Sensing.

[32]  B. S. Manjunath,et al.  Unsupervised Segmentation of Color-Texture Regions in Images and Video , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[33]  Miguel Á. Carreira-Perpiñán,et al.  Multiscale conditional random fields for image labeling , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[34]  Jing Zhang,et al.  Multi-label image annotation based on multi-model , 2013, ICUIMC '13.

[35]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[36]  George C. Polyzos,et al.  2000 IEEE International Conference on Multimedia and Expo , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[37]  Gabriela Csurka,et al.  Tree-Structured CRF Models for Interactive Image Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[39]  Ping Zhong,et al.  Learning Conditional Random Fields for Classification of Hyperspectral Images , 2010, IEEE Transactions on Image Processing.

[40]  Chien-Li Chou,et al.  Effective Semantic Annotation by Image-to-Concept Distribution Model , 2011, IEEE Transactions on Multimedia.

[41]  Mei Han,et al.  A hierarchical conditional random field model for labeling and segmenting images of street scenes , 2011, CVPR 2011.

[42]  Yong Wang,et al.  Coherent image annotation by learning semantic distance , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Sungyoung Kim,et al.  Image Classification into Object / Non-object Classes , 2004, CIVR.

[44]  Xi Liu,et al.  Voting conditional random fields for multi-label image classification , 2010, 2010 3rd International Congress on Image and Signal Processing.

[45]  Md. Monirul Islam,et al.  A review on automatic image annotation techniques , 2012, Pattern Recognit..

[46]  Zheru Chi,et al.  An Adaptive Recognition Model for Image Annotation , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[47]  Farshad Fotouhi,et al.  Region based image annotation through multiple-instance learning , 2005, MULTIMEDIA '05.

[48]  Xuelong Li,et al.  Hessian Regularized Support Vector Machines for Mobile Image Annotation on the Cloud , 2013, IEEE Transactions on Multimedia.

[49]  Andrew Zisserman,et al.  A Statistical Approach to Texture Classification from Single Images , 2005, International Journal of Computer Vision.