End-to-End Localization and Ranking for Relative Attributes

We propose an end-to-end deep convolutional network to simultaneously localize and rank relative visual attributes, given only weakly-supervised pairwise image comparisons. Unlike previous methods, our network jointly learns the attribute’s features, localization, and ranker. The localization module of our network discovers the most informative image region for the attribute, which is then used by the ranking module to learn a ranking model of the attribute. Our end-to-end framework also significantly speeds up processing and is much faster than previous methods. We show state-of-the-art ranking results on various relative attribute datasets, and our qualitative localization results clearly demonstrate our network’s ability to learn meaningful image patches.

[1]  Nuno Vasconcelos,et al.  The discriminant center-surround hypothesis for bottom-up saliency , 2007, NIPS.

[2]  Larry S. Davis,et al.  Image ranking and retrieval based on multi-attribute queries , 2011, CVPR 2011.

[3]  Bernard Ghanem,et al.  On the relationship between visual attributes and convolutional networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Alexander C. Berg,et al.  Hipster Wars: Discovering Elements of Fashion Styles , 2014, ECCV.

[5]  Kristen Grauman,et al.  Relative attributes , 2011, 2011 International Conference on Computer Vision.

[6]  Kristen Grauman,et al.  Fine-Grained Visual Comparisons with Local Learning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Thomas Deselaers,et al.  What is an object? , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  Adriana Kovashka,et al.  Attribute Adaptation for Personalized Image Search , 2013, 2013 IEEE International Conference on Computer Vision.

[9]  Shiguang Shan,et al.  Relative Forest for Attribute Prediction , 2012, ACCV.

[10]  Ehsan Adeli,et al.  Deep Relative Attributes , 2015, ACCV.

[11]  Ali Farhadi,et al.  Attribute Discovery via Predictable Discriminative Binary Codes , 2012, ECCV.

[12]  C. V. Jawahar,et al.  Relative Parts: Distinctive Parts for Learning Relative Attributes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Ali Farhadi,et al.  Attribute-centric recognition for cross-category generalization , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[15]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[16]  Alex Graves,et al.  DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[17]  Shree K. Nayar,et al.  Attribute and simile classifiers for face verification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[18]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Shree K. Nayar,et al.  FaceTracer: A Search Engine for Large Collections of Images with Faces , 2008, ECCV.

[20]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[21]  Trevor Darrell,et al.  PANDA: Pose Aligned Networks for Deep Attribute Modeling , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Pierre Baldi,et al.  A principled approach to detecting surprising events in video , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[23]  Abhinav Gupta,et al.  Constrained Semi-Supervised Learning Using Attributes and Comparative Attributes , 2012, ECCV.

[24]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[25]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[26]  Pietro Perona,et al.  Visual Recognition with Humans in the Loop , 2010, ECCV.

[27]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Nuno Vasconcelos,et al.  On the plausibility of the discriminant center-surround hypothesis for visual saliency. , 2008, Journal of vision.

[29]  Yuxin Peng,et al.  The application of two-level attention models in deep convolutional neural network for fine-grained image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Roberto Cipolla,et al.  DEEP-CARVING: Discovering visual attributes by carving deep neural nets , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[32]  Kun Duan,et al.  Discovering localized attributes for fine-grained recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Cristian Sminchisescu,et al.  Constrained parametric min-cuts for automatic object segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[34]  Nitish Srivastava,et al.  Learning Generative Models with Visual Attention , 2013, NIPS.

[35]  Donghoon Lee,et al.  Deep Attribute Networks , 2012, ArXiv.

[36]  Geoffrey E. Hinton,et al.  Zero-shot Learning with Semantic Output Codes , 2009, NIPS.

[37]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[38]  Alex Graves,et al.  Recurrent Models of Visual Attention , 2014, NIPS.

[39]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[41]  Yong Jae Lee,et al.  Discovering the Spatial Extent of Relative Attributes , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[42]  Adriana Kovashka,et al.  WhittleSearch: Image search with relative attribute feedback , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Subhransu Maji,et al.  Describing people: A poselet-based approach to attribute classification , 2011, 2011 International Conference on Computer Vision.

[44]  Ali Farhadi,et al.  Object-Centric Anomaly Detection by Attribute-Based Reasoning , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.