Fine-Grained Comparisons with Attributes

Given two images, we want to predict which exhibits a particular visual attribute more than the other—even when the two images are quite similar. For example, given two beach scenes, which looks more calm? Given two high-heeled shoes, which is more ornate? Existing relative attribute methods rely on global ranking functions. However, rarely will the visual cues relevant to a comparison be constant for all data, nor will humans’ perception of the attribute necessarily permit a global ordering. At the same time, not every image pair is even orderable for a given attribute. Attempting to map relative attribute ranks to “equality” predictions is nontrivial, particularly since the span of indistinguishable pairs in attribute space may vary in different parts of the feature space. To address these issues, we introduce local learning approaches for fine-grained visual comparisons, where a predictive model is trained on the fly using only the data most relevant to the novel input. In particular, given a novel pair of images, we develop local learning methods to (1) infer their relative attribute ordering with a ranking function trained using only analogous labeled image pairs, (2) infer the optimal “neighborhood,” i.e., the subset of the training instances most relevant for training a given local model, and (3) infer whether the pair is even distinguishable, based on a local model for just noticeable differences in attributes. Our methods outperform state-of-the-art methods for relative attribute prediction on challenging datasets, including a large newly curated shoe dataset for fine-grained comparisons. We find that for fine-grained comparisons, more labeled data is not necessarily preferable to isolating the right data.

[1]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Jitendra Malik,et al.  Learning Globally-Consistent Local Distance Functions for Shape-Based Image Retrieval and Classification , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[3]  Jean Ponce,et al.  Computer Vision: A Modern Approach , 2002 .

[4]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[5]  Léon Bottou,et al.  Local Learning Algorithms , 1992, Neural Computation.

[6]  Hsin-Hsi Chen,et al.  Query-Dependent Rank Aggregation with Local Models , 2011, AIRS.

[7]  Weng-Keen Wong,et al.  Towards recognizing "cool": can end users help computer vision recognize subjective attributes of objects in images? , 2012, IUI '12.

[8]  Ankur Datta,et al.  Hierarchical ranking of facial attributes , 2011, Face and Gesture 2011.

[9]  Abhinav Gupta,et al.  Constrained Semi-Supervised Learning Using Attributes and Comparative Attributes , 2012, ECCV.

[10]  Prateek Jain,et al.  Fast image search for learned metrics , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[12]  Larry S. Davis,et al.  Image ranking and retrieval based on multi-attribute queries , 2011, CVPR 2011.

[13]  B. Schölkopf,et al.  Image Retrieval and Classification Using Local Distance Functions , 2007 .

[14]  Shree K. Nayar,et al.  FaceTracer: A Search Engine for Large Collections of Images with Faces , 2008, ECCV.

[15]  Alexander C. Berg,et al.  Automatic Attribute Discovery and Characterization from Noisy Web Data , 2010, ECCV.

[16]  Andrew W. Moore,et al.  Locally Weighted Learning , 1997, Artificial Intelligence Review.

[17]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[18]  David J. Kriegman,et al.  Adaptive ranking of facial attractiveness , 2014, 2014 IEEE International Conference on Multimedia and Expo (ICME).

[19]  Serge J. Belongie,et al.  Relative ranking of facial attractiveness , 2013, 2013 IEEE Workshop on Applications of Computer Vision (WACV).

[20]  Kristen Grauman,et al.  Relative attributes , 2011, 2011 International Conference on Computer Vision.

[21]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[22]  Yuan Yao,et al.  Statistical ranking and combinatorial Hodge theory , 2008, Math. Program..

[23]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Adriana Kovashka,et al.  WhittleSearch: Image search with relative attribute feedback , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Shiguang Shan,et al.  Relative Forest for Attribute Prediction , 2012, ACCV.

[26]  Mark S. Nixon,et al.  Soft Biometrics; Human Identification Using Comparative Descriptions , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Kristen Grauman,et al.  Just Noticeable Differences in Visual Attributes , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[28]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[29]  Pietro Perona,et al.  Visual Recognition with Humans in the Loop , 2010, ECCV.

[30]  Mark S. Nixon,et al.  Enriching Texture Analysis with Semantic Data , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Jitendra Malik,et al.  SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[32]  Ashish Kapoor,et al.  Multilabel Classification using Bayesian Compressed Sensing , 2012, NIPS.

[33]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[34]  Shree K. Nayar,et al.  Attribute and simile classifiers for face verification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[35]  Kevin Duh,et al.  Learning to rank with partially-labeled data , 2008, SIGIR '08.

[36]  Harry Shum,et al.  Query Dependent Ranking Using K-nearest Neighbor * , 2022 .

[37]  Marc Sebban,et al.  A Survey on Metric Learning for Feature Vectors and Structured Data , 2013, ArXiv.

[38]  Sharath Pankanti,et al.  Relative Attributes for Large-Scale Abandoned Object Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[39]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[40]  Mark S. Nixon,et al.  Using comparative human descriptions for soft biometrics , 2011, 2011 International Joint Conference on Biometrics (IJCB).

[41]  Pascal Vincent,et al.  K-Local Hyperplane and Convex Distance Nearest Neighbor Algorithms , 2001, NIPS.

[42]  Kristen Grauman,et al.  Fine-Grained Visual Comparisons with Local Learning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  C. V. Jawahar,et al.  Relative Parts: Distinctive Parts for Learning Relative Attributes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Adriana Kovashka,et al.  Discovering Attribute Shades of Meaning with the Crowd , 2014, International Journal of Computer Vision.

[45]  Arijit Biswas,et al.  Simultaneous Active Learning of Classifiers & Attributes via Relative Feedback , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Adriana Kovashka,et al.  Attribute Adaptation for Personalized Image Search , 2013, 2013 IEEE International Conference on Computer Vision.

[47]  Adriana Kovashka,et al.  Attribute Pivots for Guiding Relevance Feedback in Image Search , 2013, 2013 IEEE International Conference on Computer Vision.

[48]  Vincent Conitzer,et al.  Improved Bounds for Computing Kemeny Rankings , 2006, AAAI.

[49]  Larry S. Davis,et al.  Birdlets: Subordinate categorization using volumetric primitives and pose-normalized appearance , 2011, 2011 International Conference on Computer Vision.

[50]  Shaogang Gong,et al.  Cumulative Attribute Space for Age and Crowd Density Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  Yi Liu,et al.  An Efficient Algorithm for Local Distance Metric Learning , 2006, AAAI.

[52]  John N. Hooker,et al.  Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems: 10th International Conference, CPAIOR 2013, Yorktown Heights, NY, USA, May 18-22, 2013. Proceedings , 2013, CPAIOR.

[53]  Avinava Dubey,et al.  Efficient and Accurate Local Learning for Ranking , 2009 .

[54]  Jitendra Malik,et al.  Image Retrieval and Classification Using Local Distance Functions , 2006, NIPS.

[55]  Tsuhan Chen,et al.  Spoken Attributes: Mixing Binary and Relative Attributes to Say the Right Thing , 2013, 2013 IEEE International Conference on Computer Vision.

[56]  Dimitrios Gunopulos,et al.  Adaptive Nearest Neighbor Classification Using Support Vector Machines , 2001, NIPS.

[57]  Robert Tibshirani,et al.  Discriminant Adaptive Nearest Neighbor Classification , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[58]  Terrance E. Boult,et al.  Multi-attribute spaces: Calibration for attribute fusion and similarity search , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[59]  Craig Boutilier Preference Elicitation and Preference Learning in Social Choice , 2011, CPAIOR.

[60]  Kristen Grauman,et al.  Predicting Useful Neighborhoods for Lazy Local Learning , 2014, NIPS.