论文信息 - Parts, objects and scenes: computational models and psychophysics

Parts, objects and scenes: computational models and psychophysics

In this thesis, I develop computational models to account for several different visual phenomena: (i) perception of object parts, (ii) eye movements when learning new objects, (iii) perceived shape similarity, and (iv) rapid scene identification. The validity of each model is tested using human psychophysical techniques. Beginning with parts, I note an ecological fact: parts of objects tend to be convex. Several elaborate rules have been proposed to account for our perception of parts, but they can all be understood as an attempt to exploit convexity. I propose a model that finds convex subregions within the bounding contour of the object. The segmentations produced by the model are quantitatively compared with a large-scale data set of human object segmentations using a precision-recall framework. The model produces results within the error range of the subject to subject variability, demonstrating that a simple convexity rule can account for human perception of parts. It has been suggested that we use parts to encode and retrieve object memories. By studying eye movements as observers learn new objects, one can investigate the strategies that they adopt, and what information is useful for encoding object memories. I argue that, in fact, observers employ a strategy of sequential information maximization to reduce uncertainty about the orientations of the object contour. I collect eye movement data as subjects learn novel object silhouettes, and compare it to the fixations of a biologically motivated dynamic model. Model fixations are drawn away from predictable (straight) contours and toward angles near points of high curvature, similar to observed human behavior. The model collects information too efficiently, however. By adjusting the parameters until we match human performance, we can probe the limits of human sensitivity. Objects may be recognized over successive fixations of their parts, or by matching their overall shapes, as implicitly suggested by the eye movement model. Matching objects would be straightforward if we had a perceptual shape metric that could capture the perceived similarity between two shapes. I examine two measures from the statistics and mathematics literature and ask whether they can represent human just-noticeable-differences in shape space. I find that shape discrimination thresholds are stable when measured with these metrics, for both systematic and random shape changes. This suggests that the metrics can be useful for gauging perceptual similarity of two closely related shapes. Finally, I consider the phenomenon of rapid scene identification. Subjects are able to get the gist of a scene within a single fixation. I propose a model that learns to categorize scenes based only on the responses they evoke in V1-like filters. The model performs above chance, demonstrating that categorization could begin as early as V1. When compared with human performance on a scene identification task, the model performs much like observers who had between 37 and 50 ms exposure to the image.

Jitendra Malik | Laura Renninger