Context and configuration based scene classification

The problem of scene classification is one of the significant open challenges in the field of machine vision. During the past few years, there has been a resurgence of interest in this area due to the potential applications in content-based digital image database indexing. Most proposed solutions have either skirted the problem by using textual annotation or have employed image statistics such as color histograms or local textural measures. While adequate for some tasks, these approaches are unable to capture the global configuration of a scene, which seems to be of critical significance in perceptual judgments of scene similarity. The key question this thesis addresses is how to encode a scene so as to incorporate its overall structure in a manner that would allow subsequent generalization to other members of the scene class. We present a novel approach, called "configural recognition", as a partial solution to this problem. The main features of this approach are its use of qualitative spatial and photometric relationships within and across regions in low-resolution images. The emphasis on qualitative measures endows the approach with an impressive generalization ability and the use of low-resolution images renders it computationally efficient. We present results of testing this approach on a large database of natural scenes. We also describe how qualitative scene concepts may be automatically learned from examples. The applicability of the configural recognition approach is not limited to natural scenes; we conclude by describing some other domains for which the approach seems well suited. Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-1690.