What Visual Attributes Characterize an Object Class?

Visual attribute-based learning has shown a big impact on many computer vision problems in recent years. Albeit its usefulness, most of works only focus on predicting either the presence or the strength of pre-defined attributes. In this paper, we discuss how to automatically learn visual attributes that characterize an object class. Starting from the images of an object class that are collected from the Web, we first mine visual prototypes of attributes (i.e., a clean intermediate representation for learning attributes) by clustering with Gaussian mixtures from multi-scale salient areas in noisy Web images. Second, a joint optimization model is proposed to fulfill the attribute learning with feature selection. As sparse approximation is adopted for feature selection during the joint optimization, the learned attributes tend to present a more representative visual property, e.g., stripe pattern (when texture features are selected), yellow-color (when color features are selected). Finally, to quantify the confidence of attributes and restrain the noisy attributes learned from the Web, a ranking-based method is proposed to refine the learned attributes. Our approach ensures the learned visual attributes to be visually recognizable and representative, in contrast to manually constructed attributes [1] that contain properties difficult to be visualized, e.g., “smelly,” “smart.” We evaluated our approach on two benchmark datasets, and compared with state-of-the-art approaches in two aspects: the quality of the learned visual attributes and their effectiveness in object categorization.

[1]  Daniel N. Osherson,et al.  Joshua Stern, Ormond Wilkie, Michael Stob, Edward E. Smith: Default Probability , 1991, Cogn. Sci..

[2]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Gang Wang,et al.  Joint learning of visual attributes, object classes and visual saliency , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[4]  Eli Shechtman,et al.  Matching Local Self-Similarities across Images and Videos , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Pietro Perona,et al.  Visual Recognition with Humans in the Loop , 2010, ECCV.

[6]  Cordelia Schmid,et al.  Toward Category-Level Object Recognition , 2006, Toward Category-Level Object Recognition.

[7]  Daniel,et al.  Default Probability , 2004 .

[8]  Zheng Xu,et al.  Mining visualness , 2013, 2013 IEEE International Conference on Multimedia and Expo (ICME).

[9]  Alexander C. Berg,et al.  Automatic Attribute Discovery and Characterization from Noisy Web Data , 2010, ECCV.

[10]  Cordelia Schmid,et al.  A Discriminative Framework for Texture and Object Recognition Using Local Image Features , 2006, Toward Category-Level Object Recognition.

[11]  Yair Weiss,et al.  "Natural Images, Gaussian Mixtures and Dead Leaves" , 2012, NIPS.

[12]  Andrew Zisserman,et al.  Learning Visual Attributes , 2007, NIPS.

[13]  Silvio Savarese,et al.  Recognizing human actions by attributes , 2011, CVPR 2011.

[14]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[15]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[17]  G. Golub,et al.  Eigenvalue computation in the 20th century , 2000 .

[18]  Shree K. Nayar,et al.  Attribute and simile classifiers for face verification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[19]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[20]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[21]  Rongrong Ji,et al.  Weak attributes for large-scale image retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[23]  Shih-Fu Chang,et al.  Designing Category-Level Attributes for Discriminative Visual Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Andrew Zisserman,et al.  Representing shape with a spatial pyramid kernel , 2007, CIVR '07.

[25]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[26]  Larry S. Davis,et al.  Image ranking and retrieval based on multi-attribute queries , 2011, CVPR 2011.

[27]  Yang Wang,et al.  A Discriminative Latent Model of Object Classes and Attributes , 2010, ECCV.

[28]  Mubarak Shah,et al.  Complex Events Detection Using Data-Driven Concepts , 2012, ECCV.

[29]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[30]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[31]  Wei-Ying Ma,et al.  Duplicate-Search-Based Image Annotation Using Web-Scale Data , 2012, Proceedings of the IEEE.

[32]  Andrew W. Fitzgibbon,et al.  Efficient Object Category Recognition Using Classemes , 2010, ECCV.

[33]  Hao Su,et al.  Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.

[34]  Kristen Grauman,et al.  Relative attributes , 2011, 2011 International Conference on Computer Vision.