Modeling Attributes from Category-Attribute Proportions

Attribute-based representation has been widely used in visual recognition and retrieval due to its interpretability and cross-category generalization properties. However, classic attribute learning requires manually labeling attributes on the images, which is very expensive, and not scalable. In this paper, we propose to model attributes from category-attribute proportions. The proposed framework can model attributes without attribute labels on the images. Specifically, given a multi-class image datasets with N categories, we model an attribute, based on an N-dimensional category-attribute proportion vector, where each element of the vector characterizes the proportion of images in the corresponding category having the attribute. The attribute learning can be formulated as a learning from label proportion (LLP) problem. Our method is based on a newly proposed machine learning algorithm called $\propto$SVM. Finding the category-attribute proportions is much easier than manually labeling images, but it is still not a trivial task. We further propose to estimate the proportions from multiple modalities such as human commonsense knowledge, NLP tools, and other domain knowledge. The value of the proposed approach is demonstrated by various applications including modeling animal attributes, visual sentiment attributes, and scene attributes.

[1]  John R. Smith,et al.  Multimedia semantic indexing using model vectors , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[2]  Alexander J. Smola,et al.  Estimating labels from label proportions , 2008, ICML '08.

[3]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Shree K. Nayar,et al.  Attribute and simile classifiers for face verification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[5]  Silvio Savarese,et al.  Recognizing human actions by attributes , 2011, CVPR 2011.

[6]  Larry S. Davis,et al.  Image ranking and retrieval based on multi-attribute queries , 2011, CVPR 2011.

[7]  Kun Duan,et al.  Discovering localized attributes for fine-grained recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Rongrong Ji,et al.  Weak attributes for large-scale image retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Daniel P. W. Ellis,et al.  IBM Research and Columbia University TRECVID-2012 Multimedia Event Detection (MED), Multimedia Event Recounting (MER), and Semantic Indexing (SIN) Systems , 2012, TRECVID.

[10]  Gang Hua,et al.  Semantic Model Vectors for Complex Video Event Recognition , 2012, IEEE Transactions on Multimedia.

[11]  James Hays,et al.  SUN attribute database: Discovering, annotating, and recognizing scene attributes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Felix X. Yu,et al.  SVM for learning with label proportions , 2013, ICML 2013.

[13]  Shih-Fu Chang,et al.  Designing Category-Level Attributes for Discriminative Visual Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Dong Liu,et al.  $\propto$SVM for learning with label proportions , 2013, ICML 2013.

[15]  Rongrong Ji,et al.  Large-scale visual sentiment ontology and detectors using adjective noun pairs , 2013, ACM Multimedia.

[16]  Ming-Syan Chen,et al.  Video Event Detection by Inferring Temporal Instance Labels , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Tao Chen,et al.  Object-Based Visual Sentiment Concept Analysis and Application , 2014, ACM Multimedia.

[18]  Shih-Fu Chang,et al.  On Learning with Label Proportions , 2014, ArXiv.