Natural Object Recognition: A Theoretical Framework and Its Implementation

Most work in visual recognition by computer has focused on recognizing objects by their ge­ ometric shape, or by the presence or absence of some prespecified collection of locally measur­ able attributes (e.g., spectral reflectance, texture , or distinguished markings). On the other hand, most entities in the natural world defy compact description of their shapes, and have no characteristic features with discriminatory power. As a result, image-understanding re­ search has achieved little success toward recog­ nition in natural scenes. We offer a fundamentally new approach t.o visual recognition that avoids these limitations and has been used to recognize trees, bushes, grass, and trails in ground-level scenes of a natural environment. 1 Introduction The key scientific question addressed by our research has been the design of a computer vision system that, can approach human level performance in the interpre­ tation of natural scenes such as that shown in Fig­ ure 1. We offer a. new paradigm for the design of computer vision systems that holds promise for achiev­ ing near-human competence, and report the experimen­ tal results of a system implementing that theory which demonstrates its recognition abilities in a natural do­ main of limited geographic extent,. The purpose of this paper is t.o review the key ideas underlying our ap­ proach (discussed in detail in previous publications [12, 13]) and to focus on the results of an ongoing experi­ mental evaluation of these ideas as embodied in an im­ plemented system called Condor. When examining the reasons why the traditional ap­ proaches to computer vision fail in the interpretation of ground-level scenes of the natural world, four fundamen­ tal problems become apparent: Universal partitioning — Most scene-understanding systems begin with the segmentation of an image Figure 1: A natural outdoor scene of the experimenta­ tion site. into homogeneous regions using a single partition ing algorithm applied to the entire image. If that partitioning is wrong, then the interpretation must also be wrong, no matter how a system assigns se­ mantic labels to those regions. Unfortunately, uni­ versal partitioning algorithms are notoriously poor delineators of natural objects in ground-level scenes. Shape — Many man-made artifacts can be recognized by matching a 3D geometric model with features extracted from an image [l, 2, 4, 6, 7, 9, 15], but most natural objects cannot be so recognized. Nat ural objects are assigned names on the basis of their setting, appearance, and context, …