Vision and action

Abstract Our work on active vision has recently focused on the computational modelling of navigational tasks, where our investigations were guided by the idea of approaching vision for behavioural systems in the form of modules that are directly related to perceptual tasks. These studies led us to branch in various directions and inquire into the problems that have to be addressed in order to obtain an overall understanding of perceptual systems. In this paper, we present our views about the architecture of vision systems, about how to tackle the design and analysis of perceptual systems, and promising future research directions. Our suggested approach for understanding behavioural vision to realize the relationships of perception and action builds on two earlier approaches, the Medusa philosophy1 and the Synthetic approach2. The resulting framework calls for synthesizing an artificial vision system by studying vision competences of increasing complexity and, at the same time, pursuing the integration of the perceptual components with action and learning modules. We expect that computer vision research in the future will progress in tight collaboration with many other disciplines that are concerned with empirical approaches to vision, i.e. the understanding of biological vision. Throughout the paper, we describe biological findings that motivate computational arguments which we believe will influence studies of computer vision in the near future.

[1]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[2]  J. Todd,et al.  Ordinal structure in the visual perception and cognition of smoothly curved surfaces. , 1989, Psychological review.

[3]  Yiannis Aloimonos,et al.  Active vision , 2004, International Journal of Computer Vision.

[4]  Rodney A. Brooks,et al.  A Robust Layered Control Syste For A Mobile Robot , 2022 .

[5]  Ja Movshon,et al.  Visual processing of moving images , 1990 .

[6]  N. Logothetis,et al.  Viewer-Centered Object Recognition in Monkeys , 1994 .

[7]  Patrick Henry Winston,et al.  Learning structural descriptions from examples , 1970 .

[8]  E. Johnston Systematic distortions of shape from stereopsis , 1991, Vision Research.

[9]  Alex P. Pentland,et al.  From Pixels to Predicates: Recent Advances in Computational and Robotic Vision , 1986, IEEE Expert.

[10]  M. Farah Visual Agnosia: Disorders of Object Recognition and What They Tell Us about Normal Vision , 1990 .

[11]  R. Gregory,et al.  Distortion of Visual Space as Inappropriate Constancy Scaling , 1963, Nature.

[12]  Jean-Yves Herve Navigational vision , 1993 .

[13]  G. Horridge The evolution of visual processing and the construction of seeing systems , 1987, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[14]  J J Koenderink,et al.  Affine structure from motion. , 1991, Journal of the Optical Society of America. A, Optics and image science.

[15]  S. Zeki A vision of the brain , 1993 .

[16]  Stephen M. Omohundro,et al.  Best-First Model Merging for Dynamic Learning and Recognition , 1991, NIPS.

[17]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[18]  T. Landelius Behavior Representation by Growing a Learning Tree , 1993 .

[19]  George W. Ernst,et al.  GPS : a case study in generality and problem solving , 1971 .

[20]  H. Gelernter,et al.  Realization of a geometry theorem proving machine , 1995, IFIP Congress.

[21]  James T. Todd,et al.  Ordinal structure in the visual perception and cognition of smoothly curved surfaces. , 1989 .

[22]  Leslie G. Ungerleider,et al.  Pathways for motion analysis: Cortical connections of the medial superior temporal and fundus of the superior temporal visual areas in the macaque , 1990, The Journal of comparative neurology.

[23]  Patrick Henry Winston,et al.  The psychology of computer vision , 1976, Pattern Recognit..

[24]  A. Pentland,et al.  Non-rigid motion and structure from contour , 1991, Proceedings of the IEEE Workshop on Visual Motion.

[25]  Tomaso A. Poggio,et al.  Learning of visual modules from examples: A framework for understanding adaptive visual performance , 1992, CVGIP Image Underst..

[26]  J. Cronly-Dillon,et al.  Vision and visual dysfunction. , 1994, Journal of cognitive neuroscience.

[27]  R. Wurtz,et al.  Sensitivity of MST neurons to optic flow stimuli. I. A continuum of response selectivity to large-field stimuli. , 1991, Journal of neurophysiology.

[28]  R. Weale Analysis of Visual Behaviour , 1983 .

[29]  L. Jakobson,et al.  A neurological dissociation between perceiving objects and grasping them , 1991, Nature.

[30]  Azriel Rosenfeld,et al.  Machine Vision and Learning: Research Issues and Directions , 1994 .

[31]  G. Johansson Visual perception of biological motion and a model for its analysis , 1973 .

[32]  G. Humphreys,et al.  To See But Not To See: A Case Study Of Visual Agnosia , 1987 .

[33]  Z. Pylyshyn,et al.  Vision and Action: The Control of Grasping , 1990 .

[34]  Randal C. Nelson,et al.  Detecting activities , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[35]  S. Ullman,et al.  The interpretation of visual motion , 1977 .

[36]  Yiannis Aloimonos,et al.  Purposive and qualitative active vision , 1990, [1990] Proceedings. 10th International Conference on Pattern Recognition.

[37]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[38]  Stephen M. Omohundro,et al.  Bumptrees for Efficient Function, Constraint and Classification Learning , 1990, NIPS.

[39]  Dennis Gabor,et al.  Theory of communication , 1946 .

[40]  Ronen Basri,et al.  Recognition by Linear Combinations of Models , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[41]  S. Zeki The visual image in mind and brain. , 1992, Scientific American.

[42]  D C Van Essen,et al.  Functional properties of neurons in middle temporal visual area of the macaque monkey. I. Selectivity for stimulus direction, speed, and orientation. , 1983, Journal of neurophysiology.

[43]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[44]  T S Collett,et al.  The Interaction of Oculomotor Cues and Stimulus Size in Stereoscopic Depth Constancy , 1991, Perception.

[45]  Guy A. Orban,et al.  The Analysis of Motion Signals and the Nature of Processing in the Primate Visual System , 1992 .

[46]  R. Bajcsy Active perception , 1988 .

[47]  Randal C. Nelson,et al.  Qualitative recognition of motion using temporal texture , 1992, CVGIP Image Underst..

[48]  Olivier Faugeras,et al.  Three-Dimensional Computer Vision , 1993 .

[49]  D. Jacobs Space Efficient 3D Model Indexing , 1992 .

[50]  Leslie G. Ungerleider,et al.  Cortical connections of visual area MT in the macaque , 1986, The Journal of comparative neurology.

[51]  Leslie G. Ungerleider Two cortical visual systems , 1982 .

[52]  K. Tanaka,et al.  Analysis of motion of the visual field by direction, expansion/contraction, and rotation cells clustered in the dorsal part of the medial superior temporal area of the macaque monkey. , 1989, Journal of neurophysiology.