Finding and Labeling the Subject of a Captioned Depictive Natural Photograph

We address the problem of finding the subject of a photographic image intended to illustrate some physical object or objects ("depictive") and taken by usual optical means without magnification ("natural"). This could help in developing digital image libraries since important image properties like subject size and color of a photograph are not usually mentioned in accompanying captions and can help rank the photograph retrievals for a user. We explore an approach that identifies the "visual focus" of the image and the "depicted concepts" in a caption and connects them. The visual focus is determined by using eight domain-independent characteristics of regions in the segmented image, and the caption depiction is identified by a set a rules applied to the parsed and interpreted caption. The visual-focus determination also does combinatorial optimization on sets of regions to find the set that best satisfies focus criteria. Experiments on 100 randomly selected image-caption pairs show significant improvement in precision of retrieval over simpler methods, and, particularly, emphasizes the value of segmentation of the image.

[1]  Neil C. Rowe,et al.  Natural-language retrieval of images based on descriptive captions , 1996, TOIS.

[2]  Michael J. Swain,et al.  WebSeer: An Image Search Engine for the World Wide Web , 1996 .

[3]  Alexander G. Hauptmann,et al.  Informedia: news-on-demand multimedia information acquisition and retrieval , 1997 .

[4]  Luis Alberto Pineda,et al.  A Model for Multimodal Reference Resolution , 2000, Computational Linguistics.

[5]  Terry Myers The Development of conversation and discourse , 1979 .

[6]  Radu Horaud,et al.  Figure-Ground Discrimination: A Combinatorial Optimization Approach , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Arturo A. Rodriguez,et al.  Image segmentation by successive background extraction , 1991, Pattern Recognit..

[8]  Neil C. Rowe Inferring Depictions in Natural-Language Captions for Efficient Access to Picture Data , 1994, Inf. Process. Manag..

[9]  Neil C. Rowe,et al.  Automatic Caption Localization for Photographs on World Wide Web Pages , 1998, Inf. Process. Manag..

[10]  Alain Trémeau,et al.  A region growing and merging algorithm to color segmentation , 1997, Pattern Recognit..

[11]  Rodney Huddleston,et al.  Deixis and Anaphora , 2002 .

[12]  Neil C. Rowe,et al.  Automatic classification of objects in captioned depictive photographs for retrieval , 1997 .

[13]  Robert Dale,et al.  Computational Interpretations of the Gricean Maxims in the Generation of Referring Expressions , 1995, Cogn. Sci..

[14]  Stephen W. Smoliar,et al.  Content based video indexing and retrieval , 1994, IEEE MultiMedia.

[15]  Rohini K. Srihari,et al.  Automatic Indexing and Content-Based Retrieval of Captioned Images , 1995, Computer.

[16]  Shih-Fu Chang,et al.  VisualSEEk: a fully automated content-based image query system , 1997, MULTIMEDIA '96.