Typicality ranking: beyond accuracy for video semantic annotation

In video annotation, the typicalities or relevancy degrees of relevant samples to a certain concept are generally different. Thus we argue that it is more reasonable to rank typical relevant samples higher than non-typical ones. However, generally the labels of the training data only differentiate relevant of irrelevant; that is to say, typical or non-typical training samples have the same contribution to the learning process. Therefore, the learned scores of the unlabeled data cannot well measure the typicality. Accordingly, three pre-processing approaches are proposed to relax the labels of the training data to real-valued typicality scores. Then the typicality scores of the training data are propagated to unlabeled data using manifold ranking. Meanwhile, we propose to use a novel criterion, Average Typicality Precision (ATP), to replace the frequently used one, Average Precision (AP), for evaluating the performance of video typicality ranking algorithms. Though AP cares the number of relevant samples at the top of the annotation rank list, it actually does not care the typicality order of these samples, while which was taken into consideration of the evaluation strategy ATP. Experiments conducted on the TRECVID data set demonstrate that this typicality ranking scheme is more consistent with human perception than normal accuracy based ranking schemes.

[1]  Xian-Sheng Hua,et al.  Beyond Accuracy: Typicality Ranking for Video Annotation , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[2]  Meng Wang,et al.  Efficient semantic annotation method for indexing large personal video database , 2006, MIR '06.

[3]  Jiawei Han,et al.  SRDA: An Efficient Algorithm for Large-Scale Discriminant Analysis , 2008, IEEE Transactions on Knowledge and Data Engineering.

[4]  Eric P. Xing,et al.  Harmonium Models for Semantic Video Representation and Classification , 2007, SDM.

[5]  Marcel Worring,et al.  Learning Social Tag Relevance by Neighbor Voting , 2009, IEEE Transactions on Multimedia.

[6]  Tao Mei,et al.  Video annotation based on temporally consistent Gaussian random field , 2007 .

[7]  Marcel Worring,et al.  The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[8]  L. Ungar,et al.  Active learning for logistic regression , 2005 .

[9]  Fei Wang,et al.  Label Propagation through Linear Neighborhoods , 2006, IEEE Transactions on Knowledge and Data Engineering.

[10]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[11]  John R. Smith,et al.  Large-scale concept ontology for multimedia , 2006, IEEE MultiMedia.

[12]  Xuelong Li,et al.  Modality Mixture Projections for Semantic Video Event Detection , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[13]  Greg Schohn,et al.  Less is More: Active Learning with Support Vector Machines , 2000, ICML.

[14]  Thomas S. Huang,et al.  Relevance feedback: a power tool for interactive content-based image retrieval , 1998, IEEE Trans. Circuits Syst. Video Technol..

[15]  Wei-Ying Ma,et al.  Graph based multi-modality learning , 2005, ACM Multimedia.

[16]  Edward Y. Chang,et al.  Optimal multimodal fusion for multimedia data analysis , 2004, MULTIMEDIA '04.

[17]  Bernt Schiele,et al.  A psychophysically plausible model for typicality ranking of natural scenes , 2006, TAP.

[18]  Rong Yan,et al.  The combination limit in multimedia retrieval , 2003, MULTIMEDIA '03.

[19]  Thomas S. Huang,et al.  Relevance feedback techniques in interactive content-based image retrieval , 1997, Electronic Imaging.

[20]  Meng Wang,et al.  Semi-automatic video annotation based on active learning with multiple complementary predictors , 2005, MIR '05.

[21]  Xian-Sheng Hua,et al.  Video Annotation Based on Kernel Linear Neighborhood Propagation , 2008, IEEE Transactions on Multimedia.

[22]  Meng Wang,et al.  Automatic video annotation by semi-supervised learning with kernel density estimation , 2006, MM '06.

[23]  Jialie Shen,et al.  Personalized video similarity measure , 2011, Multimedia Systems.

[24]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[25]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[26]  Sanjeev Khudanpur,et al.  Hidden Markov models for automatic annotation and content-based retrieval of images and video , 2005, SIGIR '05.

[27]  Rong Yan,et al.  Semi-supervised cross feature learning for semantic concept detection in videos , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[28]  Tao Mei,et al.  To construct optimal training set for video annotation , 2006, MM '06.

[29]  Meng Wang,et al.  Social media mining and search , 2011, Multimedia Tools and Applications.

[30]  John R. Smith,et al.  Metadata standards roundup , 2006, IEEE Multimedia.

[31]  Meng Wang,et al.  Manifold-ranking based video concept detection on large database and feature pool , 2006, MM '06.

[32]  Meng Wang,et al.  Correlative Linear Neighborhood Propagation for Video Annotation , 2009, IEEE Trans. Syst. Man Cybern. Part B.

[33]  Dong Liu,et al.  Tag ranking , 2009, WWW '09.

[34]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[35]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[36]  Meng Wang,et al.  Structure-sensitive manifold ranking for video concept detection , 2007, ACM Multimedia.

[37]  Jingrui He,et al.  Generalized Manifold-Ranking-Based Image Retrieval , 2006, IEEE Transactions on Image Processing.