Building text features for object image classification

We introduce a text-based image feature and demonstrate that it consistently improves performance on hard object classification problems. The feature is built using an auxiliary dataset of images annotated with tags, downloaded from the Internet. We do not inspect or correct the tags and expect that they are noisy. We obtain the text feature of an unannotated image from the tags of its k-nearest neighbors in this auxiliary collection. A visual classifier presented with an object viewed under novel circumstances (say, a new viewing direction) must rely on its visual examples. Our text feature may not change, because the auxiliary dataset likely contains a similar picture. While the tags associated with images are noisy, they are more stable when appearance changes. We test the performance of this feature using PASCAL VOC 2006 and 2007 datasets. Our feature performs well, consistently improves the performance of visual object classifiers, and is particularly effective when the training dataset is small.

[1]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[2]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[3]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Yee Whye Teh,et al.  Names and faces in the news , 2004, CVPR 2004.

[5]  Antonio Criminisi,et al.  Harvesting Image Databases from the Web , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[6]  Stefano Soatto,et al.  Filtering Internet image search results towards keyword based category recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Pietro Perona,et al.  A Visual Category Filter for Google Images , 2004, ECCV.

[8]  Fei-Fei Li,et al.  OPTIMOL: Automatic Online Picture Collection via Incremental Model Learning , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Pietro Perona,et al.  A sparse object category model for efficient learning and exhaustive recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[10]  Trevor Darrell,et al.  Learning Visual Representations using Images with Captions , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Gang Wang,et al.  Object image retrieval by exploiting online knowledge resources , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Boris Babenko,et al.  ImprovingWeb-based Image Search via Content Based Clustering , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[13]  Alexei A. Efros,et al.  Scene completion using millions of photographs , 2008, Commun. ACM.

[14]  R. Fergus,et al.  Tiny images , 2007 .