Constructing Concept Lexica With Small Semantic Gaps

In recent years, constructing mathematical models for visual concepts by using content features, i.e., color, texture, shape, or local features, has led to the fast development of concept-based multimedia retrieval. In concept-based multimedia retrieval, defining a good lexicon of high-level concepts is the first and important step. However, which concepts should be used for data collection and model construction is still an open question. People agree that concepts that can be easily described by low-level visual features can construct a good lexicon. These concepts are called concepts with small semantic gaps. Unfortunately, there is very little research found on semantic gap analysis and on automatically choosing multimedia concepts with small semantic gaps, even though differences of semantic gaps among concepts are well worth investigating. In this paper, we propose a method to quantitatively analyze semantic gaps and develop a novel framework to identify high-level concepts with small semantic gaps from a large-scale web image dataset. Images with small semantic gaps are selected and clustered first by defining a confidence score and a content-context similarity matrix in visual space and textual space. Then, from the surrounding descriptions (titles, categories, and comments) of these images, concepts with small semantic gaps are automatically mined. In addition, considering that semantic gap analysis depends on both features and content-contextual consistency, we construct a lexicon family of high-level concepts with small semantic gaps (LCSS) based on different low-level features and different consistency measurements. This set of lexica is both independent to each other and mutually complimentary. LCSS is very helpful for data collection, feature selection, annotation, and modeling for large-scale image retrieval. It also shows a promising application potential for image annotation refinement and rejection. The experimental results demonstrate the validity of the developed concept lexica.

[1]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[2]  Yuxiao Hu,et al.  Efficient propagation for face annotation in family albums , 2004, MULTIMEDIA '04.

[3]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[4]  Dan Roth,et al.  Learning to detect objects in images via a sparse, part-based representation , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[6]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[7]  Marcel Worring,et al.  The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[8]  John R. Smith,et al.  Large-scale concept ontology for multimedia , 2006, IEEE MultiMedia.

[9]  Changhu Wang,et al.  Scalable search-based image annotation of personal images , 2006, MIR '06.

[10]  Cordelia Schmid,et al.  Dataset Issues in Object Recognition , 2006, Toward Category-Level Object Recognition.

[11]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[12]  Rong Yan,et al.  How many high-level concepts will fill the semantic gap in news video retrieval? , 2007, CIVR '07.

[13]  Rong Yan,et al.  Can High-Level Concepts Fill the Semantic Gap in Video Retrieval? A Case Study With Broadcast News , 2007, IEEE Transactions on Multimedia.

[14]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[15]  Alexander Hauptmann,et al.  How many high-level concepts will fill the semantic gap in video retrieval ? , 2007 .

[16]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[17]  Gustavo Carneiro,et al.  Supervised Learning of Semantic Classes for Image Annotation and Retrieval , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  James Ze Wang,et al.  Real-Time Computerized Annotation of Pictures , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Qi Tian,et al.  What are the high-level concepts with small semantic gaps? , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Changhu Wang,et al.  Scalable search-based image annotation , 2008, Multimedia Systems.

[21]  Wei-Ying Ma,et al.  Annotating Images by Mining Image Search Results , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.