论文信息 - Optimizing Training Set Construction for Video Semantic Classification

Optimizing Training Set Construction for Video Semantic Classification

We exploit the criteria to optimize training set construction for the large-scale video semantic classification. Due to the large gap between low-level features and higher-level semantics, as well as the high diversity of video data, it is difficult to represent the prototypes of semantic concepts by a training set of limited size. In video semantic classification, most of the learning-based approaches require a large training set to achieve good generalization capacity, in which large amounts of labor-intensive manual labeling are ineluctable. However, it is observed that the generalization capacity of a classifier highly depends on the geometrical distribution of the training data rather than the size. We argue that a training set which includes most temporal and spatial distribution information of the whole data will achieve a good performance even if the size of training set is limited. In order to capture the geometrical distribution characteristics of a given video collection, we propose four metrics for constructing/selecting an optimal training set, including salience, temporal dispersiveness, spatial dispersiveness, and diversity. Furthermore, based on these metrics, we propose a set of optimization rules to capture the most distribution information of the whole data using a training set with a given size. Experimental results demonstrate these rules are effective for training set construction in video semantic classification, and significantly outperform random training set selection.

[1] Edward Y. Chang,et al. Multimodal concept-dependent active learning for image retrieval , 2004, MULTIMEDIA '04.

[2] Tao Mei,et al. Video annotation based on temporally consistent Gaussian random field , 2007 .

[3] G. Baudat,et al. Feature vector selection and projection using kernels , 2003, Neurocomputing.

[4] Howard D. Wactlar,et al. Putting active learning into multimedia applications: dynamic definition and refinement of concept classifiers , 2005, MULTIMEDIA '05.

[5] Meng Wang,et al. Automatic video annotation by semi-supervised learning with kernel density estimation , 2006, MM '06.

[6] Bo Zhang,et al. An online-optimized incremental learning framework for video semantic classification , 2004, MULTIMEDIA '04.

[8] Vladimir Vapnik,et al. Three remarks on the support vector method of function estimation , 1999 .

[9] Shih-Fu Chang,et al. Structure analysis of sports video using domain models , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[10] Rong Yan,et al. Automatically labeling video data using multi-class active learning , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[11] Rong Yan,et al. Semi-supervised cross feature learning for semantic concept detection in videos , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[12] Jianping Fan,et al. Semantic video classification by integrating flexible mixture model with adaptive EM algorithm , 2003, MIR '03.

[13] Meng Wang,et al. Semi-automatic video annotation based on active learning with multiple complementary predictors , 2005, MIR '05.

[14] Li-Rong Dai,et al. Video Annotation by Active Learning and Cluster Tuning , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[15] Thomas S. Huang,et al. Relevance feedback: a power tool for interactive content-based image retrieval , 1998, IEEE Trans. Circuits Syst. Video Technol..

[16] Bo Zhang,et al. Learning concepts from large scale imbalanced data sets using support cluster machines , 2006, MM '06.

[17] Meng Wang,et al. Structure-sensitive manifold ranking for video concept detection , 2007, ACM Multimedia.

[18] Shih-Fu Chang,et al. Structure analysis of soccer video with domain knowledge and hidden Markov models , 2004, Pattern Recognit. Lett..