Evaluation of Localized Semantics: Data, Methodology, and Experiments

Abstract We present a new data set of 1014 images with manual segmentations and semantic labels for each segment, together with a methodology for using this kind of data for recognition evaluation. The images and segmentations are from the UCB segmentation benchmark database (Martin et al., in International conference on computer vision, vol. II, pp. 416–421, 2001). The database is extended by manually labeling each segment with its most specific semantic concept in WordNet (Miller et al., in Int. J. Lexicogr. 3(4):235–244, 1990). The evaluation methodology establishes protocols for mapping algorithm specific localization (e.g., segmentations) to our data, handling synonyms, scoring matches at different levels of specificity, dealing with vocabularies with sense ambiguity (the usual case), and handling ground truth regions with multiple labels. Given these protocols, we develop two evaluation approaches. The first measures the range of semantics that an algorithm can recognize, and the second measures the frequency that an algorithm recognizes semantics correctly. The data, the image labeling tool, and programs implementing our evaluation strategy are all available on-line (kobus.ca//research/data/IJCV_2007). We apply this infrastructure to evaluate four algorithms which learn to label image regions from weakly labeled data. The algorithms tested include two variants of multiple instance learning (MIL), and two generative multi-modal mixture models. These experiments are on a significantly larger scale than previously reported, especially in the case of MIL methods. More specifically, we used training data sets up to 37,000 images and training vocabularies of up to 650 words. We found that one of the mixture models performed best on image annotation and the frequency correct measure, and that variants of MIL gave the best semantic range performance. We were able to substantively improve the performance of MIL methods on the other tasks (image annotation and frequency correct region labeling) by providing an appropriate prior.

[1]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[2]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[3]  David Yarowsky,et al.  One Sense Per Discourse , 1992, HLT.

[4]  G. Miller WordNet: A Lexical Database for English , 1992, HLT.

[5]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[6]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[7]  Eneko Agirre,et al.  A Proposal for Word Sense Disambiguation using Conceptual Distance , 1995, ArXiv.

[8]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[9]  Christopher K. I. Williams,et al.  Using Bayesian neural networks to classify segmented images , 1997 .

[10]  Tomás Lozano-Pérez,et al.  A Framework for Multiple-Instance Learning , 1997, NIPS.

[11]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[12]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  Shimon Edelman,et al.  Similarity-based Word Sense Disambiguation , 1998, CL.

[14]  Rada Mihalcea,et al.  Word Sense Disambiguation based on Semantic Density , 1998, WordNet@ACL/COLING.

[15]  Oded Maron,et al.  Learning from Ambiguity , 1998 .

[16]  Martin Chodorow,et al.  Combining local context and wordnet similarity for word sense identification , 1998 .

[17]  Oded Maron,et al.  Multiple-Instance Learning for Natural Scene Classification , 1998, ICML.

[18]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[19]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[20]  Pietro Perona,et al.  Unsupervised Learning of Models for Recognition , 2000, ECCV.

[21]  David A. Forsyth,et al.  Learning the semantics of words and pictures , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[22]  Jitendra Malik,et al.  A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[23]  David A. Forsyth,et al.  Clustering art , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[24]  Qi Zhang,et al.  EM-DD: An Improved Multiple-Instance Learning Technique , 2001, NIPS.

[25]  Thomas Hofmann,et al.  Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.

[26]  Thomas Hofmann,et al.  Multiple instance learning with generalized support vector machines , 2002, AAAI/IAAI.

[27]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[28]  R. Manmatha,et al.  Automatic Image Annotation and Retrieval using CrossMedia Relevance Models , 2003 .

[29]  R. Manmatha,et al.  A Model for Learning the Semantics of Pictures , 2003, NIPS.

[30]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[31]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[32]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[33]  David A. Forsyth,et al.  The effects of segmentation and feature choice in a translation model of object recognition , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[34]  Ted Pedersen,et al.  Extended Gloss Overlaps as a Measure of Semantic Relatedness , 2003, IJCAI.

[35]  Thomas Hofmann,et al.  Multiple-Instance Learning via Disjunctive Programming Boosting , 2003, NIPS.

[36]  Robert Wilensky,et al.  Experiments in Improving Unsupervised Word Sense Disambiguation , 2003 .

[37]  Stephen D. Scott,et al.  A Faster Algorithm for Generalized Multiple-Instance Learning , 2004, FLAIRS Conference.

[38]  A. Torralba,et al.  Sharing features: efficient boosting procedures for multiclass object detection , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[39]  Nando de Freitas,et al.  A Statistical Model for General Contextual Object Recognition , 2004, ECCV.

[40]  Yixin Chen,et al.  Image Categorization by Learning and Reasoning with Regions , 2004, J. Mach. Learn. Res..

[41]  N. V. Vinodchandran,et al.  An extended kernel for generalized multiple-instance learning , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[42]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[43]  Dan Roth,et al.  Learning to detect objects in images via a sparse, part-based representation , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  N. V. Vinodchandran,et al.  SVM-based generalized multiple-instance learning via approximate box counting , 2004, ICML.

[45]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[46]  Jitendra Malik,et al.  Shape matching and object recognition using low distortion correspondences , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[47]  A. Volgenant,et al.  A shortest augmenting path algorithm for dense and sparse linear assignment problems , 1987, Computing.