Similarity Comparisons for Interactive Fine-Grained Categorization

Current human-in-the-loop fine-grained visual categorization systems depend on a predefined vocabulary of attributes and parts, usually determined by experts. In this work, we move away from that expert-driven and attribute-centric paradigm and present a novel interactive classification system that incorporates computer vision and perceptual similarity metrics in a unified framework. At test time, users are asked to judge relative similarity between a query image and various sets of images, these general queries do not require expert-defined terminology and are applicable to other domains and basic-level categories, enabling a flexible, efficient, and scalable system for fine-grained categorization with humans in the loop. Our system outperforms existing state-of-the-art systems for relevance feedback-based image retrieval as well as interactive classification, resulting in a reduction of up to 43% in the average number of questions needed to correctly classify an image.

[1]  A. Tversky Features of Similarity , 1977 .

[2]  Thomas S. Huang,et al.  Relevance feedback: a power tool for interactive content-based image retrieval , 1998, IEEE Trans. Circuits Syst. Video Technol..

[3]  P. Bartlett,et al.  Probabilities for SV Machines , 2000 .

[4]  Charu C. Aggarwal,et al.  Towards meaningful high-dimensional nearest neighbor search by human-computer interaction , 2002, Proceedings 18th International Conference on Data Engineering.

[5]  Thomas S. Huang,et al.  Relevance feedback in image retrieval: A comprehensive review , 2003, Multimedia Systems.

[6]  Thorsten Joachims,et al.  Learning a Distance Metric from Relative Comparisons , 2003, NIPS.

[7]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[8]  Pietro Perona,et al.  Beyond pairwise clustering , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[9]  Yuchun Fang,et al.  Experiments in Mental Face Retrieval , 2005, AVBPA.

[10]  Jonathon S. Hare,et al.  Bridging the Semantic Gap in Multimedia Information Retrieval: Top-down and Bottom-up approaches , 2006 .

[11]  Marin Ferecatu,et al.  Interactive Search for Image Categories by Mental Matching , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[12]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[13]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Geoffrey E. Hinton,et al.  Zero-shot Learning with Semantic Output Codes , 2009, NIPS.

[15]  Marin Ferecatu,et al.  A Statistical Framework for Image Category Search from a Mental Picture , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Steve Branson,et al.  Similarity metrics for categorization: From monolithic to category specific , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[17]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Shree K. Nayar,et al.  Attribute and simile classifiers for face verification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[19]  Bernt Schiele,et al.  What helps where – and why? Semantic relatedness for knowledge transfer , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  Ali Farhadi,et al.  Attribute-centric recognition for cross-category generalization , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21]  Alexander C. Berg,et al.  Automatic Attribute Discovery and Characterization from Noisy Web Data , 2010, ECCV.

[22]  Pietro Perona,et al.  Visual Recognition with Humans in the Loop , 2010, ECCV.

[23]  Gert R. G. Lanckriet,et al.  Metric Learning to Rank , 2010, ICML.

[24]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[25]  Kristen Grauman,et al.  Relative attributes , 2011, 2011 International Conference on Computer Vision.

[26]  Adam Tauman Kalai,et al.  Adaptively Learning the Crowd Kernel , 2011, ICML.

[27]  Adriana Kovashka,et al.  Actively selecting annotations among objects and attributes , 2011, 2011 International Conference on Computer Vision.

[28]  Pietro Perona,et al.  Crowdclustering , 2011, NIPS.

[29]  Kristen Grauman,et al.  Interactively building a discriminative vocabulary of nameable attributes , 2011, CVPR 2011.

[30]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[31]  Pietro Perona,et al.  Multiclass recognition and part localization with humans in the loop , 2011, 2011 International Conference on Computer Vision.

[32]  Jeff Donahue,et al.  Annotator rationales for visual recognition , 2011, 2011 International Conference on Computer Vision.

[33]  Luis von Ahn,et al.  Human Computation for Attribute and Attribute Value Acquisition , 2011 .

[34]  Kun Duan,et al.  Discovering localized attributes for fine-grained recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  W. John Kress,et al.  Leafsnap: A Computer Vision System for Automatic Plant Species Identification , 2012, ECCV.

[36]  Subhransu Maji,et al.  Part Annotations via Pairwise Correspondence , 2012, HCOMP@AAAI.

[37]  Devi Parikh,et al.  Attributes for Classifier Feedback , 2012, ECCV.

[38]  Subhransu Maji Discovering a Lexicon of Parts and Attributes , 2012, ECCV Workshops.

[39]  Kilian Q. Weinberger,et al.  Stochastic triplet embedding , 2012, 2012 IEEE International Workshop on Machine Learning for Signal Processing.