Clustering art

We extend a recently developed method (K. Barnard and D. Forsyth, 2001) for learning the semantics of image databases using text and pictures. We incorporate statistical natural language processing in order to deal with free text. We demonstrate the current system on a difficult dataset, namely 10000 images of work from the Fine Arts Museum of San Francisco. The images include line drawings, paintings, and pictures of sculpture and ceramics. Many of the images have associated free text which varies greatly from physical description to interpretation and mood. We use WordNet to provide semantic grouping information and to help disambiguate word senses, as well as emphasize the hierarchical nature of semantic relationships. This allows us to impose a natural structure on the image collection that reflects semantics to a considerable degree. Our method produces a joint probability distribution for words and picture elements. We demonstrate that this distribution can be used: (a) to provide illustrations for given captions, and (b) to generate words for images outside the training set. Results from this annotation process yield a quantitative study of our method. Finally, the annotation process can be seen as a form of object recognizer that has been learned through a partially supervised process.

[1]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[2]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[3]  David Yarowsky,et al.  One Sense Per Discourse , 1992, HLT.

[4]  Rohini K. Srihari Extracting visual information from text: using captions to label faces in newspaper photographs , 1992 .

[5]  V. Govindaraju A computational theory for locating human faces in photographs , 1992 .

[6]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.

[7]  Venu Govindaraju,et al.  Use of Collateral Text in Image Interpretation , 1994 .

[8]  Debra T. Burhans,et al.  Visual Semantics: Extracting Visual information from Text Accompanying Pictures , 1994, AAAI.

[9]  Rohini K. Srihari,et al.  Control Structures for Incorporating Picture-Specific Context in Image Interpretation , 1995, IJCAI.

[10]  Eneko Agirre,et al.  A Proposal for Word Sense Disambiguation using Conceptual Distance , 1995, ArXiv.

[11]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[12]  Peter G. B. Enser,et al.  Progress in Documentation Pictorial Information Retrieval , 1995, J. Documentation.

[13]  Michael J. Swain,et al.  WebSeer: An Image Search Engine for the World Wide Web , 1996 .

[14]  Peter G. B. Enser,et al.  Analysis of user need in image archives , 1997, J. Inf. Sci..

[15]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Rada Mihalcea,et al.  Word Sense Disambiguation based on Semantic Density , 1998, WordNet@ACL/COLING.

[17]  Thomas Hofmann,et al.  Statistical Models for Co-occurrence Data , 1998 .

[18]  Thomas Hofmann,et al.  Learning and representing topic-a hierarchical mixture model for word occurences in document databas , 1998 .

[19]  S. Sclaroff,et al.  Combining textual and visual cues for content-based image retrieval on the World Wide Web , 1998, Proceedings. IEEE Workshop on Content-Based Access of Image and Video Libraries (Cat. No.98EX173).

[20]  David A. Forsyth,et al.  Computer Vision Tools for Finding Images and Video Sequences , 1999, Libr. Trends.

[21]  Fran ine Chena,et al.  Multi-Modal Browsing of Images in Web Do uments , 1999 .

[22]  David A. Forsyth,et al.  Learning the semantics of words and pictures , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[23]  Jitendra Malik,et al.  Blobworld: Image Segmentation Using Expectation-Maximization and Its Application to Image Querying , 2002, IEEE Trans. Pattern Anal. Mach. Intell..