Probabilistic optimized ranking for multimedia semantic concept detection via RVM

We present a probabilistic ranking-driven classifier for the detection of video semantic concept, such as airplane, building, etc. Most existing concept detection systems utilize Support Vector Machines (SVM) to perform the detection and ranking of retrieved video shots. However, the margin maximization principle of SVM does not perform ranking optimization but merely classification error minimization. To tackle this problem, we exploit the sparse Bayesian kernel model, namely the relevance vector machine (RVM), as the classifier for semantic concept detection. Based on automatic relevance determination principle, RVM outputs the posterior probabilistic prediction of the semantic concepts. This inference output is optimal for ranking the target video shots, according to the Probabilistic Ranking Principle. The probability output of RVM on individual uni-modal features also facilitates probabilistic fusion of multi-modal evidences to minimize Bayes risk. We demonstrate both theoretically and empirically that RVM outperforms SVM for video semantic concept detection. The testings on TRECVID 07 dataset show that RVM produces statically significant improvements in MAP scores over the SVM-based methods.

[1]  Sheng Gao,et al.  Classifier Optimization for Multimedia Semantic Concept Detection , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[2]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[3]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[4]  Sheng Tang,et al.  TRECVID 2006 by NUS-I2R , 2006, TRECVID.

[5]  Cordelia Schmid,et al.  Affine-invariant local descriptors and neighborhood statistics for texture recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[6]  Meng Wang,et al.  MSRA-USTC-SJTU at TRECVID 2007: High-Level Feature Extraction and Search , 2007, TRECVID.

[7]  S. Robertson The probability ranking principle in IR , 1997 .

[8]  Kam-Fai Wong,et al.  Probability ranking principle via optimal expected rank , 2007, SIGIR.

[9]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[10]  Jiebo Luo,et al.  Image transform bootstrapping and its applications to semantic scene classification , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[11]  Rong Yan,et al.  Multi-Lingual Broadcast News Retrieval , 2006, TRECVID.

[12]  Gordon V. Cormack,et al.  Validity and power of t-test for comparing MAP and GMAP , 2007, SIGIR.

[13]  Zhang Bo,et al.  Relationship between support vector set and kernel functions in SVM , 2002 .

[14]  Chitra Dorai,et al.  Bridging the semantic gap with computational media aesthetics , 2003, IEEE MultiMedia.

[15]  Hung-Khoon Tan,et al.  Experimenting VIREO-374: Bag-of-Visual-Words and Visual-Based Ontology for Semantic Video Indexing and search , 2007, TRECVID.

[16]  Hung-Khoon Tan,et al.  Modeling Local Interest Points for Semantic Detection and Video Search at TRECVID 2006 , 2006, TRECVID.

[17]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[18]  Dong Xu,et al.  Columbia University TRECVID-2006 Video Search and High-Level Feature Extraction , 2006, TRECVID.

[19]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[20]  Christopher M. Bishop,et al.  Variational Relevance Vector Machines , 2000, UAI.

[21]  Filip Radlinski,et al.  A support vector method for optimizing average precision , 2007, SIGIR.

[22]  M. Evans,et al.  Methods for Approximating Integrals in Statistics with Special Emphasis on Bayesian Integration Problems , 1995 .

[23]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[24]  Wolfgang Effelsberg,et al.  Automatic recognition of film genres , 1995, MULTIMEDIA '95.

[25]  Ankur Agarwal,et al.  Hyperfeatures - Multilevel Local Coding for Visual Recognition , 2006, ECCV.

[26]  Duy-Dinh Le,et al.  NII-ISM, Japan at TRECVID 2007: High Level Feature Extraction , 2007, TRECVID.

[27]  Marcel Worring,et al.  The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[28]  Dong Wang,et al.  THU and ICRC at TRECVID 2007 , 2007, TRECVID.

[29]  X. Xue,et al.  High-Level Feature Extraction and Copy Detection , 2009 .

[30]  Hwanjo Yu,et al.  SVM selective sampling for ranking with application to data retrieval , 2005, KDD '05.

[31]  John R. Smith,et al.  On the detection of semantic concepts at TRECVID , 2004, MULTIMEDIA '04.

[32]  Michael E. Tipping The Relevance Vector Machine , 1999, NIPS.