Combining Data-Driven and Model-Based Cuesfor Segmentation of Video

In this paper, we present a system for segmentation of video image sequences which is able to make use of and integrate both a number of low-level submodalities and model-based cues. The segmentation model is based on Potts spins with coarse-tone dynamics comparable to real-space renormalisation methods often used in theoretical physics. Motion and average intensity are used here as low-level cues, and results from object recognition based on elastic graph matching give additional, model-based information to aid in segmenting the images. It is shown that high-level information improves the segmentation performance signiicantly and enables the system to build up object representations correctly and improve them as more information becomes available.