Dynamic appearance-based vision

Theories of vision have traditionally confined themselves to the passive analysis of static images, focusing on the extraction of task-independent, 3D reconstructions of the visual world. However, the images projected on the retina are seldom unrelated static signals. Rather, they represent measurements of a coherent and continuous stream of events occurring in the visual environment, constrained both by physical laws of nature and the observer''s actions on the immediate environment. In short, vision is inherently a dynamic process. .pp In this thesis, we propose two related theories of dynamic visual perception. The first theory exploits the ability to make eye movements for dynamically exploring the visual world. The resulting architecture uses appearance-based models of objects in lieu of hand-coded 3D models, and employs two visual routines, one for object identification and another for object location, for solving visual cognitive tasks. The second theory, which can be seen as an elaboration of the first, is based directly on the premise that vision is a stochastic, dynamic process. The task of visual perception is then reduced to the dual problems of optimally estimating visual events occurring in the external environment, and on a longer time scale, learning efficient internal models of the environment. Both estimation and learning are appearance-based, relying only on input images rather than hand-coded object/environment models. Using this framework, we derive estimation and learning algorithms for visual recognition, visual "attention," occlusion-handling, segmentation, prediction, hierarchical recognition, transformation-invariant recognition, and pose estimation. Experimental results are provided to corroborate the viability of these derived algorithms. .pp In addition to their potential applications in machine vision and robotics, the derived algorithms can also be used to understand human and mammalian vision. We use the visual routines theory to model saccade learning behaviors in infants, visual search/cognitive behaviors in adult subjects, and hemispatial neglect in patients with parietal cortex lesions. The optimal estimation and learning framework is used to interpret the hierarchical and laminar circuitry of the mammalian visual cortex, and to explain neuronal properties such as endstopping, response suppression during free viewing of natural images, and spatiotemporal receptive field development in primary visual cortex.