Use of Generalized Pattern Model for Video Annotation

This paper proposes an integrated framework that combines intra-shot and temporal inter-shot sequence analysis based on visual features to find stable patterns for video annotation. At the shot level, we perform multi-stage kNN classification using the global visual features to identify good candidate shots containing the concept. At the sequence level, we aim to find patterns of shot sequences around candidate shots with consistent statistical characteristics and dynamics. We discretize the shot contents into fixed set of tokens, and transform the high dimensional continuous video streams into tractable token sequences. We then extend the soft matching model to reveal video sequence patterns and flexibly match the patterns around candidate shots. We combine both local shot matching method and generalized pattern model using both visual and text features. Experimental results on TRECVID2006 dataset demonstrate that the proposed approach is effective.