News video classification using SVM-based multimodal classifiers and combination strategies

Video classification is the first step toward multimedia content understanding. When video is classified into conceptual categories, it is usually desirable to combine evidence from multiple modalities. However, combination strategies in previous studies were usually ad hoc. We investigate a meta-classification combination strategy using Support Vector Machine, and compare it with probability-based strategies. Text features from closed-captions and visual features from images are combined to classify broadcast news video. The experimental results show that combining multimodal classifiers can significantly improve recall and precision, and our meta-classification strategy gives better precision than the approach of taking the product of the posterior probabilities.

[1]  Tomaso A. Poggio,et al.  A general framework for object detection , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[2]  Zhu Liu,et al.  Multimedia content analysis-using both audio and visual clues , 2000, IEEE Signal Process. Mag..

[3]  Alexander Hauptmann,et al.  Meta-Classification of Multimedia Classifiers , 2002, KDMCD.

[4]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[5]  Bernhard Schölkopf,et al.  Comparing support vector machines with Gaussian kernels to radial basis function classifiers , 1997, IEEE Trans. Signal Process..

[6]  Zhu Liu,et al.  Classification TV programs based on audio information using hidden Markov model , 1998, 1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175).

[7]  Zhu Liu,et al.  Integration of multimodal features for video scene classification based on HMM , 1999, 1999 IEEE Third Workshop on Multimedia Signal Processing (Cat. No.99TH8451).

[8]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[9]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[10]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[11]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[12]  Takeo Kanade,et al.  Intelligent Access to Digital Video: Informedia Project , 1996, Computer.

[13]  Cheng Lu,et al.  Classification of summarized videos using hidden markov models on compressed chromaticity signatures , 2001, MULTIMEDIA '01.

[14]  Gang Wei,et al.  Video classification based on HMM using text and faces , 2000, 2000 10th European Signal Processing Conference.

[15]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..