Learning Visual Attributes

We present a probabilistic generative model of visual attributes, together with an efficient learning algorithm. Attributes are visual qualities of objects, such as 'red', 'striped', or 'spotted'. The model sees attributes as patterns of image segments, repeatedly sharing some characteristic properties. These can be any combination of appearance, shape, or the layout of segments within the pattern. Moreover, attributes with general appearance are taken into account, such as the pattern of alternation of any two colors which is characteristic for stripes. To enable learning from unsegmented training images, the model is learnt discriminatively, by optimizing a likelihood ratio. As demonstrated in the experimental evaluation, our model can learn in a weakly supervised setting and encompasses a broad range of attributes. We show that attributes can be learnt starting from a text query to Google image search, and can then be used to recognize the attribute and determine its spatial extent in novel real-world images.

[1]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[2]  Andrew Zisserman,et al.  Texture classification: are filter banks necessary? , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[3]  Yanxi Liu,et al.  The Promise and Perils of Near-Regular Texture , 2004, International Journal of Computer Vision.

[4]  N. Jojic,et al.  Capturing image structure with probabilistic index maps , 2004, CVPR 2004.

[5]  N. Jojic,et al.  Capturing image structure with probabilistic index maps , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[6]  Antonio Criminisi,et al.  Object categorization by learned universal visual dictionary , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[7]  Keiji Yanai,et al.  Image region entropy: a measure of "visualness" of web images associated with one concept , 2005, MULTIMEDIA '05.

[8]  Cordelia Schmid,et al.  A sparse texture representation using local affine regions , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Nebojsa Jojic,et al.  LOCUS: learning object classes with unsupervised segmentation , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[10]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[11]  Cordelia Schmid,et al.  Learning Color Names from Real-World Images , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.