论文信息 - A three-level model of comparative visual search

A three-level model of comparative visual search

In the experiments of comparative visual search reported here, each half of a display contains simple geometrical objects of three different colors and forms. The two hemifields are identical except for one mismatch either in color or form. The subject's task is to find this difference. Eye-movement recording yields insight into the interaction of mental processes involved in the completion of this demanding task. We present a hierarchical model of comparative visual search and its implementation as a computer simulation. The evaluation of simulation data shows that this Three-Level Model is able to explain about 98% of the empirical data collected in six different experiments. Comparative Visual Search Comparative visual search can be considered a complex variant of the picture-matching paradigm (Humphrey & Lupker, 1993). In picture-matching experiments, subjects are typically presented with pairs of images and have to indicate whether or not they show the same object. In comparative visual search, however, pairs of almost identical item distributions are to be compared, requiring subjects to switch between the two images several times before detecting a possible mismatch. The stimuli in the experiments reported here showed patterns of simple geometrical items on a black background. The items appeared in three different forms (triangles, squares, and circles) and three different colors (fully saturated blue, green, and yellow), each of them covering about 0.7 degrees of visual angle in diameter. The item locations were randomly generated, but avoiding item contiguity as well as item overlap. Each stimulus picture consisted of two hemifields (size 11x16 degrees each) separated by a vertical white line. There were 30 items in each hemifield, which were equally balanced for color and form. The hemifields were translationally identical in the color, the form, and the spatial arrangement of the 30 items with one exception: There was always a single item that differed from its "twin" in the other hemifield, either in color or in form. The subjects' task was to find this single mismatch. They were to press a mouse button as soon as they detected the mismatch. Eye movements during comparative visual search were measured with the OMNITRACK1 system, which has a temporal resolution of 60 Hz and a spatial precision of about 0.6 degrees. Sixteen subjects participated in Experiment 1, each of them viewing 50 pictures. Subjects knew that the critical mismatch would be either in form or in color, they did not know, however, when to expect what kind of mismatch. In fact, 25 of the 50 trials contained a difference in form and 25 contained a difference in color. Experiments 2 to 6 differed from Experiment 1 in specific aspects (see Table 1) in order to provide comprehensive data on comparative visual search (cf. Pomplun, 1998; Pomplun et al., to appear). Table 1: Six different experiments of comparative visual search Experiment Subjects Trials per subject Description 1 16 50 No information about dimension of mismatch 2 20 60 Subjects know dimension of mismatch in advance 3 16 60 No entropy in irrelevant dimension 4 14 60 Search for a match instead of a mismatch 5 16 50 Mirror symmetry between hemifields 6 16 60 Comparison of item groups of varying size Figure 1: Example stimulus with the plotted visual scan path chosen by one of the subjects. Fixations are numbered; circle size signifies fixation duration. Figure 1 shows an example stimulus for Experiment 1 with a subject's gaze trajectory superimposed on it. As the example suggests, subjects switch between the hemifields very often and tend to fixate groups of items rather than single items. Moreover, they prefer to use exhaustive, selfavoiding scan paths for optimal search efficiency. For the quantitative analysis of eye movements, the independent variables local item density, local color entropy, and local form entropy were introduced. While local density indicates how closely packed the items are in a certain region of the stimulus, local entropy tells us to what extent different item features are mixed. There are nine different dependent variables, for example fixation duration (FD) and number of successive fixations within the same hemifield (SF). These variables make it possible to investigate the influence of the local information content on a subject's eye movements. FD, for instance, increases with the local item density at the fixation point, but not with the local entropy values, indicating that short processes like single fixations are controlled by localization processes rather than by identification processes. SF, however, depends on both density and entropy: It decreases with increasing density or entropy, i.e. increasing amount of information, at the first fixation point after switching between hemifields. The quantity of this effect yields data about the capacity of visual working memory. Taken together, eye-movement analysis in comparative visual search allows us to investigate the interaction of several perceptive and cognitive processes during the completion of a demanding task. In order to test the hypotheses derived from empirical results, a comprehensive model and its computer simulation are required. The Three-Level Model The Three-Level Model is not the first attempt to reproduce eye-movement patterns in comparative visual search. A simpler predecessor, the Random-Walk Model (Pomplun, 1998), directly incorporated several empirical eyemovement parameters (e.g. FD and saccade length) and their dependence on local stimulus features. The main shortcoming of the Random-Walk Model turned out to be the exclusion of higher cognitive levels, leading to unstructured search behavior. In contrast, subjects tend to structure their search, e.g. by favoring self-avoiding global scan paths. Another problem was the direct implementation of statistical properties of empirical eye-movement variables into the model. Although this can tell us to what extent these variables determine the subjects' gaze trajectories, it does not allow us to test our interpretations of empirical findings. It is clearly more comprehensible to use a model that incorporates only these interpretations, i.e. the assumed interaction of several perceptive and cognitive components, instead of the raw empirical data. This model should generate fixations and saccades on the basis of assumed mental processes and their parameters derived from the empirical research. If the model is able to replicate the empirical eye-movement patterns, it supports our interpretations. The Three-Level Model is a phenomenological approach meeting these requirements. Its structure is essentially motivated by the inadequacy of its predecessor, showing that different levels of processing during comparative visual search have to be distinguished. In addition to the rather schematic processes of perception, memorization, comparison etc., a higher cognitive level must be taken into account, which is responsible for global planning processes. Figure 2: Scheme of the Three-Level Model. The example stimulus contains only eight items per hemifield for the sake of clarity. Consequently, the Three-Level Model incorporates a vertical organization of mental processes, i.e. a hierarchical scheme of functional modules, better in line with current views about human brain architecture (see, e.g., Velichkovsky, 1990; Gazzaniga, 1997). A further aspect of the model's vertical organization is the dissociation of eye movements and attention. It is a wellknown fact that shifts of attention can be performed without moving the eyes (Wright & Ward, 1994; Tsal, 1983). Accordingly, the finding that subjects fixate groups of items rather than single items might be due to "invisible" shifts of attention: While attention is successively spent to all items in the display, it is not necessary to readjust the foveal gaze position for each of these steps to provide sufficient acuity. This assumption is supported by the results of studies (Pomplun, 1998) investigating the discriminability of the items used in comparative visual search: Reaction time and error rate for detecting color and form features do not vary significantly with retinal eccentricity between 0 and 10 degrees. Figure 2 presents the structure of the Three-Level Model. On the upper level, the global strategy is planned and realized. Presumably, one of the hemifields is used as a reference with respect to this purpose; hence, the global scan path is plotted only in the left hemifield. The intermediate level is concerned with shifts of attention and processes of memorization and comparison. While the global course of processing is determined by the upper level, the local attentional shifts within and between the hemifields, needed for memorization and comparison of item features, are conducted at this intermediate level. Finally, the lower level is responsible for actually executing eye movements. The gaze position follows the attentional focus, if necessary, to provide appropriate visual acuity for the processing of information. Fixation points are chosen in such a way that the next group of items to be inspected can be memorized or compared employing as few fixations as possible. The integration of the three individual levels into a single model is described in the following sections. The Upper Level: Global Strategy The model's global scanning strategy is based on the Color TSP Scanning Model developed in previous research (Pomplun, 1998). It was found that subjects tend to scan a display of randomly distributed items in a "traveling salesman" fashion, i.e. using scan paths of minimal length. Moreover, subjects can take advantage of the items' colors. If the colors are clustered within the display, i.e. if there are separate areas of blue, yellow, and green items, subjects tend to completely scan each of these areas before proceeding to the next one. This strategy reduces their memory load,

Helge Ritter | Marc Pomplun

[1] Helge J. Ritter,et al. Comparative visual search: a difference that makes a difference , 2001, Cogn. Sci..

[2] L. Cosmides. From : The Cognitive Neurosciences , 1995 .

[3] Marc Pomplun,et al. Analysis and models of eye movements in comparative visual search , 1998 .

[4] L. M. Ward,et al. Shifts of Visual Attention: An Historical and Methodological Overview , 1994 .

[5] L. Kaufman,et al. Handbook of perception and human performance , 1986 .

[6] G. Keith Humphrey,et al. Codes and operations in picture matching , 1993, Psychological research.

[7] Andrew Rutherford. Handbook of perception and human performance. Vol 1: Sensory processes and perception. Vol 2: Cognitive processes and performance. , 1987 .

[8] B. Velichkovsky. The vertical dimension of mental functioning , 1990 .

[9] Y. Tsal. Movements of attention across the visual field. , 1983, Journal of experimental psychology. Human perception and performance.