论文信息 - Similarity Comparisons for Interactive Fine-Grained Categorization

Similarity Comparisons for Interactive Fine-Grained Categorization

Current human-in-the-loop fine-grained visual categorization systems depend on a predefined vocabulary of attributes and parts, usually determined by experts. In this work, we move away from that expert-driven and attribute-centric paradigm and present a novel interactive classification system that incorporates computer vision and perceptual similarity metrics in a unified framework. At test time, users are asked to judge relative similarity between a query image and various sets of images, these general queries do not require expert-defined terminology and are applicable to other domains and basic-level categories, enabling a flexible, efficient, and scalable system for fine-grained categorization with humans in the loop. Our system outperforms existing state-of-the-art systems for relevance feedback-based image retrieval as well as interactive classification, resulting in a reduction of up to 43% in the average number of questions needed to correctly classify an image.

[1] A. Tversky. Features of Similarity , 1977 .

[2] Thomas S. Huang,et al. Relevance feedback: a power tool for interactive content-based image retrieval , 1998, IEEE Trans. Circuits Syst. Video Technol..

[3] P. Bartlett,et al. Probabilities for SV Machines , 2000 .

[4] Charu C. Aggarwal,et al. Towards meaningful high-dimensional nearest neighbor search by human-computer interaction , 2002, Proceedings 18th International Conference on Data Engineering.

[5] Thomas S. Huang,et al. Relevance feedback in image retrieval: A comprehensive review , 2003, Multimedia Systems.

[6] Thorsten Joachims,et al. Learning a Distance Metric from Relative Comparisons , 2003, NIPS.

[7] Leonidas J. Guibas,et al. The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[8] Pietro Perona,et al. Beyond pairwise clustering , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[9] Yuchun Fang,et al. Experiments in Mental Face Retrieval , 2005, AVBPA.

[10] Jonathon S. Hare,et al. Bridging the Semantic Gap in Multimedia Information Retrieval: Top-down and Bottom-up approaches , 2006 .

[11] Marin Ferecatu,et al. Interactive Search for Image Categories by Mental Matching , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[12] James Ze Wang,et al. Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[13] Christoph H. Lampert,et al. Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[14] Geoffrey E. Hinton,et al. Zero-shot Learning with Semantic Output Codes , 2009, NIPS.

[15] Marin Ferecatu,et al. A Statistical Framework for Image Category Search from a Mental Picture , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16] Steve Branson,et al. Similarity metrics for categorization: From monolithic to category specific , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[17] Ali Farhadi,et al. Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[18] Shree K. Nayar,et al. Attribute and simile classifiers for face verification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[19] Bernt Schiele,et al. What helps where – and why? Semantic relatedness for knowledge transfer , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20] Ali Farhadi,et al. Attribute-centric recognition for cross-category generalization , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21] Alexander C. Berg,et al. Automatic Attribute Discovery and Characterization from Noisy Web Data , 2010, ECCV.

[22] Pietro Perona,et al. Visual Recognition with Humans in the Loop , 2010, ECCV.

[23] Gert R. G. Lanckriet,et al. Metric Learning to Rank , 2010, ICML.

[24] Andrea Vedaldi,et al. Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[25] Kristen Grauman,et al. Relative attributes , 2011, 2011 International Conference on Computer Vision.

[26] Adam Tauman Kalai,et al. Adaptively Learning the Crowd Kernel , 2011, ICML.

[27] Adriana Kovashka,et al. Actively selecting annotations among objects and attributes , 2011, 2011 International Conference on Computer Vision.

[28] Pietro Perona,et al. Crowdclustering , 2011, NIPS.

[29] Kristen Grauman,et al. Interactively building a discriminative vocabulary of nameable attributes , 2011, CVPR 2011.

[30] Pietro Perona,et al. The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[31] Pietro Perona,et al. Multiclass recognition and part localization with humans in the loop , 2011, 2011 International Conference on Computer Vision.

[32] Jeff Donahue,et al. Annotator rationales for visual recognition , 2011, 2011 International Conference on Computer Vision.

[33] Luis von Ahn,et al. Human Computation for Attribute and Attribute Value Acquisition , 2011 .

[34] Kun Duan,et al. Discovering localized attributes for fine-grained recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[35] W. John Kress,et al. Leafsnap: A Computer Vision System for Automatic Plant Species Identification , 2012, ECCV.

[36] Subhransu Maji,et al. Part Annotations via Pairwise Correspondence , 2012, HCOMP@AAAI.

[37] Devi Parikh,et al. Attributes for Classifier Feedback , 2012, ECCV.

[38] Subhransu Maji. Discovering a Lexicon of Parts and Attributes , 2012, ECCV Workshops.

[39] Kilian Q. Weinberger,et al. Stochastic triplet embedding , 2012, 2012 IEEE International Workshop on Machine Learning for Signal Processing.