The development of perceptual integration in visual robots

Our eyes see well only what is directly in front of them; they must continually scan the faces, words, and objects around us. Perceptual integration is the process of combining the resulting jumpy, incomplete images into our stable, comprehensive perception of the world. Visual robots, whose goals and designs are becoming more life-like, share this need. This thesis presents IRV, a visual robot that integrates visual information across camera movements. As a means to robust and accurate perceptual integration, IRV learns to solve the problem from experience, which consists of a series of random movements of a camera mounted on a motorized pan-tilt platform, observing the day-to-day activity in a laboratory. Learning proceeds without a prior analytic model, external calibration references, or a contrived environment. Because the solution is learned using minimal geometric assumptions, it can compensate for arbitrary imaging distortions, including lens aberrations, rotation of the camera about its viewing axis, and spatially-varying or even random sampling patterns. IRV develops an accurate model of its own visual-motor geometry by learning to predict the sampled images that follow each random, but precise, camera movement. Gradually accumulating evidence over repeated practice movements, IRV overcomes the ambiguity inherent in real-world perceptual-motor learning. The computational basis of perceptual integration itself is a connectionist visual memory that continuously transforms visual information from previous fixations into a reference frame centered on the current viewing direction. Both learning and performance exploit a motor metric that associates pairs of points in visual space with eye movement parameters, to establish an interpretable, linear visual representation. The computational architecture, including the learning mechanism, as well as the natural environment, approximate the conditions of biological perceptual development. Perceptual learning and mature performance both manifest time and space complexities commensurate with human abilities and resources. Experiments confirm the practicality of visual robots that learn to perceive the stability of the world despite eye movements, learn to integrate geometric features across fixations, and, in general, develop and calibrate accurate models of their own perceptual-motor systems.