Video Annotation by Active Learning and Cluster Tuning

Supervised and semi-supervised learning are frequently applied methods to annotate videos by map..ing low-level features into high-level semantic concepts. Though they work well for certain concepts, the performance is still far from reality due to the large gap between the features and the semantics. The main constraint of these methods is that the information contained in a limited number of labeled training samples can hardly represent the distributions of the semantic concepts. In this paper, we propose a novel semi-automatic video annotation framework, active learning with clustering tuning, to tackle the disadvantages of current video annotation solutions. In this framework, firstly an initial training set is constructed based on clustering the entire video dataset. And then a SVM-based active learning scheme is proposed, which aims at maximizing the margin of the SVM classifier by manually selectively labeling a small set of samples. Moreover, in each round of active learning, we tune/refine the clustering results based on the prediction results of current stage, which is beneficial for selecting the most informative samples in the active learning process, as well as helps further improve the final annotation accuracy in the post-processing step. Experimental results show that the proposed scheme performs superior to typical active learning algorithms in terms of both annotation accuracy and stability.