The identification of index terms in natural language object descriptions

"The flowering part, it looks like someone is sticking their tongue out" (a subject's description of Arethusa bulbosa, see Figure 1). The mechanisms that people use in natural settings to describe objects to one another can be used to inform the design of image retrieval and museum systems. The image retrieval problem may be recast as an object description problem where the images are of objects. This study examines the vocabulary and communication constructs that are used by novices and domain experts to describe objects in an object identification task. These human-centered devices may prove to be more understandable and easier to use than some purely computational approaches. The experimental conditions mimic a scenario where a person queries an agent (active botanical information resource) in natural language in order to identify plant images. The analysis identified the objects of discourse (objects, parts and relations) including analogies, exemplars, prototypical shapes and shape modification predicates such as "longer," and "wider." In spoken language novices and horticulturists use descriptive mechanisms similar to that in botanical text but at different frequencies. For example, participants rely heavily on visual analogies to objects both within and outside of the domain. "This looks like a X" where X is a plant (i.e. "daisy") or a non-plant (i.e. "butterfly" or "child's drawing of the sun"). The results suggest that indexing and retrieval systems should provide semantic level similarity mechanisms to allow for whole-object as well as part-wise visual analogy. The systems should also provide a visual vocabulary, a set of images that represent prototypes of the verbal terms collected in this study. Listener's Set[1] Speaker's Set[2] Figure 1: Arethusa bulbosa (Dragon's-mouth)

[1]  Corinne Jörgensen,et al.  Indexing Images: Testing an Image Description Template. , 1996 .

[2]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[3]  P. Bryan Heidorn,et al.  Shapes from natural language in Verbal Image , 1998 .

[4]  Barbara J. Grosz,et al.  Focusing and Description in Natural Language Dialogues , 1979 .

[5]  Ian H. Witten,et al.  Teaching Agents to Learn: From User Study to Implementation , 1997, Computer.

[6]  Myron Flickner,et al.  Query by Image and Video Content , 1995 .

[7]  Corinne Jörgensen,et al.  The Visual Thesaurus in a Hypermedia Environment: A Preliminary Exploration of Conceptual Issues and Applications , 1991, ICHIM.

[8]  Shih-Fu Chang,et al.  VisualSEEk: a fully automated content-based image query system , 1997, MULTIMEDIA '96.

[9]  Stephen C. Hirtle,et al.  Natural language processing of visual language for image storage and retrieval , 1997 .

[10]  Samantha Kelly Hastings,et al.  Query Categories in a Study of Intellectual Access to Digitized Art Images. , 1995 .

[11]  S. Pinker,et al.  Natural language and natural selection , 1990, Behavioral and Brain Sciences.

[12]  Dedre Gentner,et al.  Mechanisms of Analogical Learning. , 1987 .

[13]  Alberto Del Bimbo,et al.  Visual information retrieval , 1999 .

[14]  Patrick Oliver,et al.  Representation and Processing of Spatial Expressions , 1998 .

[15]  David A. Forsyth,et al.  Computer Vision Tools for Finding Images and Video Sequences , 1999, Libr. Trends.

[16]  Shih-Fu Chang,et al.  Tools and techniques for color image retrieval , 1996, Electronic Imaging.

[17]  Christos Faloutsos,et al.  QBIC project: querying images by content, using color, texture, and shape , 1993, Electronic Imaging.