Beyond Accuracy: Typicality Ranking for Video Annotation

In this paper, we address the issue of typicality ranking for video annotation and propose to use a novel criterion, average typicality precision (ATP), to replace the frequently used one, average precision (AP), for evaluating the performance of video annotation algorithms. General annotation methods just care the number of true-positive samples at the top of the ranked list; they actually do not care the order of these samples. We argue that it is more reasonable to rank "typical" true-positive samples higher than non-typical ones, which can be evaluated by our proposed ATP. However, generally the labels of the training data only differentiate true from false; that is to say, typical or non-typical training samples have the same contribution to the learning process. Therefore, the labels of the unlabeled data learned from these training data can not well measure the typicality. In this paper, we relax the labels of the training data to real-valued typicality scores by a pre-processing stage, which is accomplished by three approaches, including density estimation, user feedback and active learning. Then the typicality scores of the training data are propagated to unlabeled data using manifold-ranking. Experiments conducted on the TRECVID data set demonstrate that this typicality ranking scheme is more consistent with human perception than normal accuracy based ranking schemes.

[1]  L. Ungar,et al.  Active learning for logistic regression , 2005 .

[2]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[3]  Rong Yan,et al.  Semi-supervised cross feature learning for semantic concept detection in videos , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[4]  David G. Stork,et al.  Pattern Classification , 1973 .

[5]  Thomas S. Huang,et al.  Relevance feedback: a power tool for interactive content-based image retrieval , 1998, IEEE Trans. Circuits Syst. Video Technol..

[6]  Jingrui He,et al.  Generalized Manifold-Ranking-Based Image Retrieval , 2006, IEEE Transactions on Image Processing.

[7]  Greg Schohn,et al.  Less is More: Active Learning with Support Vector Machines , 2000, ICML.

[8]  Tao Mei,et al.  To construct optimal training set for video annotation , 2006, MM '06.

[9]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[10]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[11]  Tao Mei,et al.  Video annotation based on temporally consistent Gaussian random field , 2007 .

[12]  Sanjeev Khudanpur,et al.  Hidden Markov models for automatic annotation and content-based retrieval of images and video , 2005, SIGIR '05.