Correlative multi-label multi-instance image annotation

In this paper, each image is viewed as a bag of local regions, as well as it is investigated globally. A novel method is developed for achieving multi-label multi-instance image annotation, where image-level (bag-level) labels and region-level (instance-level) labels are both obtained. The associations between semantic concepts and visual features are mined both at the image level and at the region level. Inter-label correlations are captured by a co-occurence matrix of concept pairs. The cross-level label coherence encodes the consistency between the labels at the image level and the labels at the region level. The associations between visual features and semantic concepts, the correlations among the multiple labels, and the cross-level label coherence are sufficiently leveraged to improve annotation performance. Structural max-margin technique is used to formulate the proposed model and multiple interrelated classifiers are learned jointly. To leverage the available image-level labeled samples for the model training, the region-level label identification on the training set is firstly accomplished by building the correspondences between the multiple bag-level labels and the image regions. JEC distance based kernels are employed to measure the similarities both between images and between regions. Experimental results on real image datasets MSRC and Corel demonstrate the effectiveness of our method.

[1]  R. Zemel,et al.  Multiscale conditional random fields for image labeling , 2004, CVPR 2004.

[2]  Gerhard Winkler,et al.  Image analysis, random fields and dynamic Monte Carlo methods: a mathematical introduction , 1995, Applications of mathematics.

[3]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  Alexei A. Efros,et al.  Using Multiple Segmentations to Discover Objects and their Extent in Image Collections , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[5]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[6]  Long Zhu,et al.  Unsupervised Learning of Probabilistic Object Models (POMs) for Object Classification, Segmentation, and Recognition Using Knowledge Propagation , 2009, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Thorsten Joachims,et al.  Learning structural SVMs with latent variables , 2009, ICML '09.

[8]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Chong-Wah Ngo,et al.  Semantic context modeling with maximal margin Conditional Random Fields for automatic image annotation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[11]  Jianping Fan,et al.  Multi-Kernel Multi-Label Learning with Max-Margin Concept Network , 2011, IJCAI.

[12]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[13]  Tommi S. Jaakkola,et al.  More data means less inference: A pseudo-max approach to structured learning , 2010, NIPS.

[14]  Jason Weston,et al.  A kernel method for multi-labelled classification , 2001, NIPS.

[15]  B. S. Manjunath,et al.  Unsupervised Segmentation of Color-Texture Regions in Images and Video , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, CVPR 2004.

[17]  Jitendra Malik,et al.  Representing and Recognizing the Visual Appearance of Materials using Three-dimensional Textons , 2001, International Journal of Computer Vision.

[18]  Tao Mei,et al.  Joint multi-label multi-instance learning for image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Fei-Fei Li,et al.  Spatially Coherent Latent Topic Model for Concurrent Segmentation and Classification of Objects and Scenes , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[20]  Miguel Á. Carreira-Perpiñán,et al.  Multiscale conditional random fields for image labeling , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[21]  Dong Liu,et al.  Unified tag analysis with multi-edge graph , 2010, ACM Multimedia.

[22]  Zhi-Hua Zhou,et al.  Learning a distance metric from multi-instance multi-label data , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Jieping Ye,et al.  Drosophila Gene Expression Pattern Annotation through Multi-Instance Multi-Label Learning , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[24]  Tao Mei,et al.  Correlative multi-label video annotation , 2007, ACM Multimedia.

[25]  Daniel P. Huttenlocher,et al.  Efficient Graph-Based Image Segmentation , 2004, International Journal of Computer Vision.

[26]  Tibério S. Caetano,et al.  Reverse Multi-Label Learning , 2010, NIPS.

[27]  Vladimir Pavlovic,et al.  A New Baseline for Image Annotation , 2008, ECCV.

[28]  Andrew McCallum,et al.  Collective multi-label classification , 2005, CIKM '05.

[29]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[30]  Thomas Hofmann,et al.  Multi-Instance Multi-Label Learning with Application to Scene Classification , 2007 .

[31]  J. E. Kelley,et al.  The Cutting-Plane Method for Solving Convex Programs , 1960 .

[32]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[33]  Zhi-Hua Zhou,et al.  Multi-Instance Multi-Label Learning with Application to Scene Classification , 2006, NIPS.

[34]  Cordelia Schmid,et al.  TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[35]  Jianping Fan,et al.  Leveraging loosely-tagged images and inter-object correlations for tag recommendation , 2010, ACM Multimedia.