Joint learning of visual attributes, object classes and visual saliency

We present a method to learn visual attributes (eg.“red”, “metal”, “spotted”) and object classes (eg. “car”, “dress”, “umbrella”) together. We assume images are labeled with category, but not location, of an instance. We estimate models with an iterative procedure: the current model is used to produce a saliency score, which, together with a homogeneity cue, identifies likely locations for the object (resp. attribute); then those locations are used to produce better models with multiple instance learning. Crucially, the object and attribute models must agree on the potential locations of an object. This means that the more accurate of the two models can guide the improvement of the less accurate model. Our method is evaluated on two data sets of images of real scenes, one in which the attribute is color and the other in which it is material. We show that our joint learning produces improved detectors. We demonstrate generalization by detecting attribute-object pairs which do not appear in our training data. The iteration gives significant improvement in performance.

[1]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[2]  Thomas Hofmann,et al.  Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.

[3]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[4]  Jitendra Malik,et al.  Learning to detect natural image boundaries using local brightness, color, and texture cues , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Andrew Zisserman,et al.  An Affine Invariant Salient Region Detector , 2004, ECCV.

[6]  Jitendra Malik,et al.  Representing and Recognizing the Visual Appearance of Materials using Three-dimensional Textons , 2001, International Journal of Computer Vision.

[7]  Pietro Perona,et al.  On the usefulness of attention for object recognition , 2004 .

[8]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[9]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[10]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[11]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[12]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[13]  Cordelia Schmid,et al.  Learning Color Names from Real-World Images , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Andrew Zisserman,et al.  Learning Visual Attributes , 2007, NIPS.

[15]  Andrew Zisserman,et al.  Representing shape with a spatial pyramid kernel , 2007, CIVR '07.

[16]  Frédéric Jurie,et al.  Randomized Clustering Forests for Image Classification , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Larry S. Davis,et al.  Beyond Nouns: Exploiting Prepositions and Comparative Adjectives for Learning Visual Classifiers , 2008, ECCV.

[19]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.