Most work in visual recognition by computer has focused on recognizing objects by their ge ometric shape, or by the presence or absence of some prespecified collection of locally measur able attributes (e.g., spectral reflectance, texture , or distinguished markings). On the other hand, most entities in the natural world defy compact description of their shapes, and have no characteristic features with discriminatory power. As a result, image-understanding re search has achieved little success toward recog nition in natural scenes. We offer a fundamentally new approach t.o visual recognition that avoids these limitations and has been used to recognize trees, bushes, grass, and trails in ground-level scenes of a natural environment. 1 Introduction The key scientific question addressed by our research has been the design of a computer vision system that, can approach human level performance in the interpre tation of natural scenes such as that shown in Fig ure 1. We offer a. new paradigm for the design of computer vision systems that holds promise for achiev ing near-human competence, and report the experimen tal results of a system implementing that theory which demonstrates its recognition abilities in a natural do main of limited geographic extent,. The purpose of this paper is t.o review the key ideas underlying our ap proach (discussed in detail in previous publications [12, 13]) and to focus on the results of an ongoing experi mental evaluation of these ideas as embodied in an im plemented system called Condor. When examining the reasons why the traditional ap proaches to computer vision fail in the interpretation of ground-level scenes of the natural world, four fundamen tal problems become apparent: Universal partitioning — Most scene-understanding systems begin with the segmentation of an image Figure 1: A natural outdoor scene of the experimenta tion site. into homogeneous regions using a single partition ing algorithm applied to the entire image. If that partitioning is wrong, then the interpretation must also be wrong, no matter how a system assigns se mantic labels to those regions. Unfortunately, uni versal partitioning algorithms are notoriously poor delineators of natural objects in ground-level scenes. Shape — Many man-made artifacts can be recognized by matching a 3D geometric model with features extracted from an image [l, 2, 4, 6, 7, 9, 15], but most natural objects cannot be so recognized. Nat ural objects are assigned names on the basis of their setting, appearance, and context, …
[1]
Jay M. Tenenbaum.
On locating objects by their distinguishing features in multisensory images
,
1973,
Comput. Graph. Image Process..
[2]
T. Garvey.
Perceptual strategies for purposive vision
,
1975
.
[3]
D. Kriegman,et al.
On recognizing and positioning curved 3D objects from image contours
,
1989,
[1989] Proceedings. Workshop on Interpretation of 3D Scenes.
[4]
W. Grimson,et al.
Model-Based Recognition and Localization from Sparse Range or Tactile Data
,
1984
.
[5]
Thomas M. Strat,et al.
The Core Knowledge System
,
1987
.
[6]
Robert C. Bolles,et al.
3DPO: A Three- Dimensional Part Orientation System
,
1986,
IJCAI.
[7]
D. W. Thompson,et al.
Three-dimensional model matching from an unconstrained viewpoint
,
1987,
Proceedings. 1987 IEEE International Conference on Robotics and Automation.
[8]
Thomas M. Strat,et al.
Natural Object Recognition
,
1992,
Springer Series in Perception Engineering.
[9]
David J. Kriegman,et al.
On Recognizing and Positioning Curved 3-D Objects from Image Contours
,
1990,
IEEE Trans. Pattern Anal. Mach. Intell..
[10]
Olivier D. Faugeras,et al.
A 3-D Recognition and Positioning Algorithm Using Geometrical Matching Between Primitive Surfaces
,
1983,
IJCAI.
[11]
Thomas M. Strat,et al.
A Knowledge-Based Architecture for Organizing Sensory Data
,
1986,
IAS.