Multimodal semi-supervised learning for image classification

In image categorization the goal is to decide if an image belongs to a certain category or not. A binary classifier can be learned from manually labeled images; while using more labeled examples improves performance, obtaining the image labels is a time consuming process. We are interested in how other sources of information can aid the learning process given a fixed amount of labeled images. In particular, we consider a scenario where keywords are associated with the training images, e.g. as found on photo sharing websites. The goal is to learn a classifier for images alone, but we will use the keywords associated with labeled and unlabeled images to improve the classifier using semi-supervised learning. We first learn a strong Multiple Kernel Learning (MKL) classifier using both the image content and keywords, and use it to score unlabeled images. We then learn classifiers on visual features only, either support vector machines (SVM) or least-squares regression (LSR), from the MKL output values on both the labeled and unlabeled images. In our experiments on 20 classes from the PASCAL VOC'07 set and 38 from the MIR Flickr set, we demonstrate the benefit of our semi-supervised approach over only using the labeled images. We also present results for a scenario where we do not use any manual labeling but directly learn classifiers from the image tags. The semi-supervised approach also improves classification accuracy in this case.

[1]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[2]  Pietro Perona,et al.  A Visual Category Filter for Google Images , 2004, ECCV.

[3]  Antonio Torralba,et al.  Semi-Supervised Learning in Gigantic Image Collections , 2009, NIPS.

[4]  Cordelia Schmid,et al.  Coloring Local Feature Extraction , 2006, ECCV.

[5]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[7]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[8]  Gang Wang,et al.  Learning image similarity from Flickr groups using Stochastic Intersection Kernel MAchines , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[9]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[10]  Trevor Darrell,et al.  Learning Visual Representations using Images with Captions , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[12]  Antonio Criminisi,et al.  Harvesting Image Databases from the Web , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[13]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[14]  Derek Hoiem,et al.  Building text features for object image classification , 2009, CVPR.

[15]  Yves Grandvalet,et al.  Y.: SimpleMKL , 2008 .

[16]  David A. Forsyth,et al.  Animals on the Web , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[17]  Shumeet Baluja,et al.  Probabilistic Modeling for Face Orientation Discrimination: Learning from Labeled and Unlabeled Data , 1998, NIPS.

[18]  Yee Whye Teh,et al.  Names and faces in the news , 2004, CVPR 2004.

[19]  Daniel P. Huttenlocher,et al.  Landmark classification in large-scale image collections , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[20]  Geoffrey E. Hinton,et al.  Coaching variables for regression and classification , 1998, Stat. Comput..

[21]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[22]  Cordelia Schmid,et al.  TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[23]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[24]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).