A model for figure-ground segmentation by self-organized cue integration

The goal of image segmentation is to divide an image into meaningful regions for understanding the image. The major challenge is the absence of a single reliable information source or cue that can provide universal segmentation criteria under all situations. Most segmentation methods concentrate on improving one aspect of the cues, and are only designed for their particular application domains. We believe that such difficulties can be alleviated by cue integration, which shares information from multiple modalities. This dissertation presents an efficient and reliable approach for segmenting coherent objects from video sequences that are taken with a stationary camera. A probabilistic cue integration framework is formulated using a modified Bayes’ rule. The pixels in each frame can decide their distributions between the figure and ground regions by deriving the posterior probabilities from the likelihood models of the bottom-up cues (background subtraction, color, and texture) and the prior probability model of the top-down cue (generalized object representation). These models of the cues provide independent and complementary observations that are subsequently trained by self-adaptation toward the segmentation consensus. The contribution of an individual cue under different situations is adjusted by measuring that cue’s quality based on its similarity with the overall segmentation. Allowing cooperation and competition among the cues at the same time, the system maintains and improves its segmentation results in xv a self-organized manner. Results on various sequences show accuracy and robustness of the system with real time performance. The cue integration model presented in this dissertation unifies bottom-up and top-down cues in a parallel fashion. Each cue corresponds to one or a few Gestalt principles of perceptual organization. Two segmentation systems are compared, where one integrates only the bottom-up cues while the other further incorporates the top-down contribution. The latter system demonstrates greater performance under difficult situations such as abrupt lighting changes. The top-down process allows development of adaptive representations for objects, which can be directly applied to other applications such as object recognition, detection and tracking. With the help of this top-down information, the unified integration system is able to handle ambiguous scenes containing multiple objects with occlusions.