Discriminating figure from ground : The role of edge detection and region growing

Three general classes of algorithms have been proposed for figure/ground segregation. One class attempts to delineate figures by searching for edges, whereas another class attempts to grow homogeneous regions; the third class consists of hybrid algorithms, which combine both procedures in various ways. The experiment reported here demonstrated that humans use a hybrid algorithm that makes use of both kinds of processes simultaneously and interactively. This conclusion follows from the patterns of response times observed when humans tried to recognize degraded polygons. By blurring the edges, the edge-detection process was selectively impaired, and by imposing noise over the figure and background, the regiongrowing process was selectively impaired. By varying the amounts of both sorts of degradation independently, the interaction between the two processes was observed. One of the fundamental purposes of vision is to allow us to recognize objects. Recognition occurs when sensory input accesses the appropriate memory representations, which allows one to know more about the stimulus than is apparent in the immediate input (e.g., its name). Before visual input can be compared to previously stored information, the regions of the image likely to correspond to a figure must be segregated from those comprising the background. The initial input from the eyes is in many ways like a bit-map image in a computer, with only local properties being represented by the activity of individual cells; only after the input is organized into larger groups, which are likely to correspond to objects and parts thereof, can it be encoded into memory and compared to stored representations of shape. Thus, understanding of the processes that segregate figure from ground is of fundamental importance for understanding the nature of perception. Researchers in computer vision have been faced with the problems of segregating figure from ground, and in this report we explore whether the human brain uses some of the algorithms they have developed. In computer vision, the input is a large intensity array, with a number representing the intensity of light at each point in the display. Two broad classes of algorithms have been devised to organize this welter of input into regions likely to correspond to objects. One class contains edge-based algorithms (1-3). These algorithms look first for sharp changes in intensity (i.e., maxima in first derivatives or zero crossings in the second derivative of the function relating intensity to position), which are assumed to correspond to edges. In the Marr-Hildreth theory (3), these changes are observed at multiple scales of resolution and, if present at each, are taken to indicate edges (and not texture or the like). The local points of sharp change are connected, resulting in a depiction of edges that are assembled into the outlines of objects. The other class contains the so-called region-based algorithms (4-7). These algorithms construct regions by growing and splitting areas that are maximally homogeneous; they compute not derivatives of intensity but rather homogeneity measures, such as intensity variance. In short, the first algorithm tries to delineate regions by discovering edges, whereas the second delineates edges by discovering regions. Investigations of the neurophysiology of vision provide strong evidence that mammalian brains use algorithms in the first class. Hubel and Wiesel's (8) "simple cells" in striate cortex seem to be part of an implementation of an edge-based algorithm (compare ref. 9). These cells detect sharp changes in intensity. However, both the linking of local points of sharp change into larger edges and the growing of regions are processes that require a more global organization of the image. Recent work (10) suggests that some such global processes are carried out in area V2, but the findings do not indicate clearly which algorithm is implemented here. The experiment reported here uses a psychological approach to investigate whether one or both of these algorithms better models the way humans segregate figure from ground. This experiment was designed to discriminate among six alternative hypotheses: the human brain organizes visual input solely by an edge-based algorithm; solely by a regionbased algorithm; by whichever algorithm is successful most quickly; by neither algorithm; by both algorithms, with one following the other; or by using both algorithms simultaneously and interactively. In addition, it provides numerical evidence for evaluating various models of simultaneous functioning of the two algorithms. In this experiment, subjects were asked to judge whether light polygons on a dark background were the same or different from a target shape. Holding constant the average intensities inside and outside the figures, the edges of the test stimuli were blurred to a greater or lesser degree, and the amount of variability in the intensity of the points composing the figure and ground was varied by superimposing noise to a greater or lesser degree. If the brain parses using edge detection, then the sharpness of the gradient from ground to figure should be critical, with greater blur resulting in more time and errors. Similarly, if the brain uses region growing, then the overlap in intensity variability between figure and ground should be critical, with greater overlap resulting in more time and errors. Finally, different forms of interactions between the two variables will indicate whether the two algorithms are used independently or interactively. In the design of this experiment, we were aware that very large amounts of superimposed variability begin to introduce spurious irregular edges all over the stimulus, and very large amounts of blur wipe out the shape of the region. However, these are second-order effects: provided that the noise and blur is not too extreme, properly aligned simple-cell-type edge detectors will respond equally strongly to a sharp edge with or without superimposed noise and weakly to the noise alone. Similarly, with a blurred edge of limited width w, 7354 The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact. Proc. Natl. Acad. Sci. USA 84 (1987) 7355 region-growing algorithms will immediately group the parts of the figure and ground away from the edge by distance w. The stimuli consisted of nine simple geometric shapes, such as a triangle and a diamond. The stimuli were initially computed as 512 x 512 images on a VAX computer and displayed on a AED color graphics monitor. The polygons were normalized to a perimeter of 950 pixels (hence varying in area) and centered on the screen. Letting 0 represent black, and 1 represent the brightest output of the monitor, the mean intensity of the interior of the figures was always 0.7, whereas the mean intensity of the ground was always 0.3. The edges were blurred by convolving the image with four Gaussian filters, g(i), with spatial standard deviations of 0, 4, 8, and 12 pixels. A noise signal n was computed by using a Fourier series with independent normally distributed random coefficients a(i, J) such that