Image Retrieval as Linguistic and Nonlinguistic Visual Model Matching

THISARTICLE REVIEWS RESEARCH ON HOW people use mental models of images in an information retrieval environment. An understanding of these cognitive processes can aid a researcher in designing new systems and help librarians select systems that best serve their patrons. There are traditionally two main approaches to image indexing: concept-based and content-based (Rasmussen, 1997). The concept-based approach is used in many production library systems, while the content-based approach is dominant in research and in some newer systems. In the past, contentbased indexing supported the identification of “low-level” features in an image. These features frequently do not require verbal labels. In many cases, current computer technology can create these indexes. Conceptbased indexing, on the other hand, is a primarily verbal and abstract identification of “high-level” concepts in an image. This type of indexing requires the recognition of meaning and is primarily performed by humans. Most production-level library systems rely on concept-based indexing using keywords. Manual keyword indexing is, however, expensive and introduces problems with consistency. Recent advances have made some content-based indexing practical. In addition, some researchers are working on machine vision and pattern recognition techniques that blur the line between concept-based and content-based indexing. It is now possible to produce computer systems that allow users to search simultaneously on aspects of both concept-based and content-based indexes. The intelliP. Bryan Heidorn, Graduate School of Library and Information Science, University of Illinois, 501 E. Daniel, Champaign, IL 61820 LIBRARY TRENDS, Vol. 48,No. 2, Fall 1999, pp. 303-325 01999 The Board of Trustees, University of Illinois 304 LIBRARY TRENDS/FALL 1999 gent application of this technology requires an understanding of the user’s visual mental models of images and cognitive behavior. INTRODUCTION To better understand the relationship between concept-based and content-based indexing in a volume such as this, it is useful to refocus and re-evaluate image indexing. An understanding of these techniques may be unified by examining how each relates to “visual mental models.” From this perspective, image retrieval system work is an endeavor to create a concordance between an abstract indexing model of visual information and a person’s mental model of the same information. All visual information retrieval research, from the computational complexity of edge detectors to national standards for museum indexing of graphical material, is an attempt to bring the indexing model and the user’s mental model into line. All index abstraction, nonlinguistic or linguistic, may be classified by their success in matching the user’s abilities. Borgman (1986) emphasizes that retrieval systems should be designed around “nafural” human thinking processes. Index facet effectiveness is more dependent on the facets’ harmonization of the facets with human cognition than on whether it is linguistic (concept-based) or nonlinguistic (content-based) . In describing the content of images in the realm of art, Panofsky (1955) distinguishes between pre-iconography, iconography, and iconology. Preiconographic content refers to the nonsymbolic or factual subject matter of an image. It includes the generic actions, entities, and entity attributes in an image. As an example, a pre-iconographic index may indicate that an image contains a stone (attribute), bridge (entity), and a river (entity). Iconographic content identifies individual or specific entities or actions. In the example, the bridge might be identified as the “Palmer Bridge” and the “Hudson River.” The iconologic index would include the symbolic meaning of an image. The image might be indexed as “peaceful” or symbolizing “simpler times.” The indexing that is appropriate depends on the type of subject matter that the searchers will eventually have in mind when they are doing a search. This type of subject classification can be used to explain the strengths and weaknesses of content-based and concept-based indexing. Computers frequently perform content-based indexing. Computers can cost-effectively identify image attributes such as color, texture, and layout. Historically, limitations in computer algorithms have limited computer indexing to just a fraction of the pre-iconographics content. This, however, is changing, and the challenge for researchers and developers is to expand the functionality of the systems. Within limited contexts, computer indexing has been able to move into iconographic subject matter. For example, by exploiting information in picture captions in newspapers, a system may identify individuals in an image (Srihari, 1995). Other sysHEIDORN/IMAGE RETRIEVAL 305 tems can identify and index objects such as trees or horses using low-level features such as texture and symmetry (Forsyth et al., 1996). Linguistic content-based indexing has traditionally been performed by humans. While it is expensive and time consuming, it is possible to create indexes for all three types of content matter described by Panofsky. Hastings (1995) demonstrated that, in some retrieval situations, searchers use a combination of both visual and verbal features. With current technology, this means the use of both content-based and concept-based techniques. This article will focus on pre-iconographic indexing since this is the main area where content-based and concept-based techniques overlap. Content-based techniques may be used effectively where the computer can extract and synthesize features, attributes, and entities in images that are consistent with human understanding of the images. The computer must model the image in a way that is isomorphic (but not identical) to the human model of the image. Human indexers and searchers must also shape representations or mental models of the images if the indexer is to produce a functional index. In order to demonstrate the importance and pervasiveness of this process, this article will explore two aspects of indexing: color and object naming (shape). The first section will discuss the cognitive and social processes that give rise to the visual mental models that are shared by indexers and searchers. The next section explains what is meant by mental models in this context. Following this is a discussion of the representation of objects and shapes in visual mental models and then how both content-based and concept-based indexes capture (or neglect) aspects of these models. This is followed by a discussion of color in mental models and then discussion of the approaches to concept-based and content-based indexing by color. IMAGE ACCESS PROCESS AS A SOCIOCOGNITIVE Imagine an image of a bridge at sunset on a winter day. What color is the sky? Is there a name for the color? What objects are in the image? Are they important? Is the sun visible or has it already descended below the horizon? If you wanted to store this image with 100,000 others, how would you find it again? How would you describe it so that someone else could find it? Would words be enough? The answer to all of these questions depends on personal history and cultural expectations. The act of indexing and accessing images from a database is a sociocognitive process grounded in both biology and experience. The term “sociocognitive” here means a combination of the social aspects of cognition as well as the individual aspects of mental life. Cognition refers to all processes involved in the perception, transformation, storage, retrieval, manipulation, and use of information by people. Of particular interest here will be those aspects of cognition that are called mental models. In a social context, we often wish to communicate our thoughts to 306 LIBRARY TRENDS/FAIL 1999 others. We frequently do this with language but also through our postures, gestures, or hand drawn illustrations or, for the gifted, through works of art. Communication between people is an act of one person referencing and changing the representations used in the cognition of another person, what they are thinking about, and even how they are thinking. In this context, indexing is a form of communication between the indexer and the people who will search for images in a collection. The indexer must rely on both shared cognitive heritage and social conventions to represent salient aspects of an image in the indexing scheme. The searchers, in using the index, must express their interests in the same language that was used by the indexers. In the first paragraph of this section of the article, you were asked, through natural language, to create a “visual mental model” or “image” in your mind. Each reader’s image is different, but certainly there are aspects of the image that are shared among readers. Some of these aspects may be based on the shared biolo<g of our vision systenis (most of us can imagine color), and some shared aspects may be attributable to our shared experience. M’e all know what bridges are without having been born with that knowledge. Some aspects of the visual mental model are easily described with natural language or verbal tags. Other aspects seem to defy simple linguistic description. “Although grammars provide devices for conveying rough topological information such as connectivity, contact, and containment, and coarse metric contrasts such as near/far or flat/globular, they are of very little help in conveying precise Euclidean relations: a picture is worth a thousand words” (Pinker & Bloom, 1995, p. 715). This linguistic versus nonlinguistic contrast parallels concept-based and content-based indexing techniques. Understanding these mental models of images and how we can communicate information about them can enlighten us regarding content-based and concept-based indexing. Shera (1965) identified prerequisites for constructing a framework for indexing (an indexing vocabulary). These inc

[1]  Rajiv Mehrotra,et al.  Similar-Shape Retrieval in Shape Data Management , 1995, Computer.

[2]  George Lakoff,et al.  Women, Fire, and Dangerous Things , 1987 .

[3]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[4]  Corinne Jörgensen,et al.  The Visual Thesaurus in a Hypermedia Environment: A Preliminary Exploration of Conceptual Issues and Applications , 1991, ICHIM.

[5]  David Marr,et al.  VISION A Computational Investigation into the Human Representation and Processing of Visual Information , 2009 .

[6]  B. Tversky,et al.  Objects, parts, and categories. , 1984 .

[7]  Zenon W. Pylyshyn,et al.  What the Mind’s Eye Tells the Mind’s Brain: A Critique of Mental Imagery , 1973 .

[8]  Alex Pentland,et al.  Photobook: Content-based manipulation of image databases , 1996, International Journal of Computer Vision.

[9]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[10]  Shih-Fu Chang,et al.  Tools and techniques for color image retrieval , 1996, Electronic Imaging.

[11]  Shih-Fu Chang,et al.  VisualSEEk: a fully automated content-based image query system , 1997, MULTIMEDIA '96.

[12]  M. Tarr,et al.  Testing conditions for viewpoint invariance in object recognition. , 1997, Journal of experimental psychology. Human perception and performance.

[13]  Hayit Greenspan,et al.  Finding Pictures of Objects in Large Collections of Images , 1996, Object Representation in Computer Vision.

[14]  Christos Faloutsos,et al.  QBIC project: querying images by content, using color, texture, and shape , 1993, Electronic Imaging.

[15]  Kannan Ramchandran,et al.  Multimedia Analysis and Retrieval System (MARS) Project , 1996, Data Processing Clinic.

[16]  P. Kay,et al.  The linguistic significance of the meanings of basic color terms , 1978 .

[17]  Rohini K. Srihari Using Speech Input for Image Interpretation, Annotation, and Retrieval , 1996, Data Processing Clinic.

[18]  Noreen H. Klein,et al.  Cognitive Reference Points in Consumer Decision Making , 1987 .

[19]  R. Shepard,et al.  CHRONOMETRIC STUDIES OF THE ROTATION OF MENTAL IMAGES , 1973 .

[20]  Samantha Kelly Hastings,et al.  Query Categories in a Study of Intellectual Access to Digitized Art Images. , 1995 .

[21]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[22]  Amarnath Gupta,et al.  Visual information retrieval , 1997, CACM.

[23]  Christine L. Borgman The user's mental model of an information retrieval system: an experiment on a prototype online catalog , 1999, Int. J. Hum. Comput. Stud..

[24]  Rajiv Mehrotra Content-Based Image Modeling and Retrieval , 1996, Data Processing Clinic.

[25]  Rohini K. Srihari,et al.  Automatic Indexing and Content-Based Retrieval of Captioned Images , 1995, Computer.

[26]  S. Pinker,et al.  Natural language and natural selection , 1990, Behavioral and Brain Sciences.

[27]  Barbara Tversky,et al.  Parts, Partonomies, and Taxonomies. , 1989 .

[28]  R. Shepard The mental image. , 1978 .

[29]  William R. Hendee,et al.  Cognitive Interpretation of Visual Signals , 1997 .

[30]  S M Kosslyn,et al.  Visual images preserve metric spatial information: evidence from studies of image scanning. , 1978, Journal of experimental psychology. Human perception and performance.

[31]  W. D. Wright Physiological Optics , 1958, Nature.