This paper presents novel methods for classifying images based on knowledge discovered from annotated images using WordNet. The novelty of this work is the automatic class discovery and the classifier combination using the extracted knowledge. The extracted knowledge is a network of concepts (e.g., image clusters and word-senses) with associated image and text examples. Concepts that are similar statistically are merged to reduce the size of the concept network. Our knowledge classifier is constructed by training a meta-classifier to predict the presence of each concept in images. A Bayesian network is then learned using the meta-classifiers and the concept network. For a new image, the presence of concepts is first detected using the meta-classifiers and refined using Bayesian inference. Experiments have shown that combining classifiers using knowledge-based Bayesian networks results in superior (up to 15%) or comparable accuracy to individual classifiers and purely statistically learned classifier structures. Another contribution of this work is the analysis of the role of visual and text features in image classification. As text or joint text + visual features perform better in classifying images than visual features, we tried to predict text features for images without annotations; however, the accuracy of visual + predicted text features did not consistently improve over visual features.
[1]
Corinne Jörgensen,et al.
Attributes of Images in Describing Tasks
,
1998,
Inf. Process. Manag..
[2]
George A. Miller,et al.
WordNet: A Lexical Database for English
,
1995,
HLT.
[3]
Milind R. Naphade,et al.
A probabilistic framework for semantic video indexing, filtering, and retrieval
,
2001,
IEEE Trans. Multim..
[4]
Shih-Fu Chang,et al.
Automatic Multimedia Knowledge Discovery, Summarization and Evaluation
,
2003
.
[5]
Martin Szummer,et al.
Indoor-outdoor image classification
,
1998,
Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database.
[6]
Shih-Fu Chang,et al.
Learning Structured Visual Detectors from User Input at Multiple Levels
,
2001,
Int. J. Image Graph..
[7]
Shih-Fu Chang,et al.
Image and video search engine for the World Wide Web
,
1997,
Electronic Imaging.
[8]
David A. Forsyth,et al.
Matching Words and Pictures
,
2003,
J. Mach. Learn. Res..
[9]
Phil D. Green,et al.
Robust automatic speech recognition with missing and unreliable acoustic data
,
2001,
Speech Commun..
[10]
Shih-Fu Chang,et al.
Integration of Visual and Text-Based Approaches for the Content Labeling and Classification of Photographs
,
1999,
SIGIR 1999.