News video classification based on multi-modal information fusion

A multi-modal information fusion technique integrating the closed caption, anchor's speech, and visual information for TV news video classification is presented. By recognizing closed-caption characters from video, phrases of single- and double-character are found for classification. On the other hand, content of the anchor's speech signal is not recognized, but instead, labeled with pre-trained cluster means by using a level-building DP (dynamic programming) algorithm. Visual information, including the color and motion features, is extracted from the news footage part for classification. The above three information is individually classified by using statistical relevance factor (RF) or SVM (support vector machine) technique, amounting to 7 different classifiers. Results of multiple classifiers are then combined to get fused outputs by using a modified Bayesian technique. Experiments show that the proposed fusion system is capable of increasing the classification rate by 14% with respect to the best single-modal system. Our Bayesian fusion rule also outperforms the best product rule presented in J. Kittler, et al (1998) by 3%.

[1]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[2]  Ren C. Luo,et al.  Multisensor fusion and integration: approaches, applications, and future research directions , 2002 .

[3]  Shigeaki Watanabe,et al.  Subspace method to pattern recognition , 1973 .

[4]  John Zimmerman,et al.  Integrated multimedia processing for topic segmentation and classification , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[5]  Ming Zhang,et al.  SRFW: a simple, fast and effective text classification algorithm , 2002, Proceedings. International Conference on Machine Learning and Cybernetics.

[6]  Chih-Jen Lin,et al.  Probability Estimates for Multi-class Classification by Pairwise Coupling , 2003, J. Mach. Learn. Res..

[7]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Peng Wang,et al.  A hybrid approach to news video classification multimodal features , 2003, Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint.

[9]  Nirwan Ansari,et al.  Adaptive decision fusion for unequiprobable sources , 1997 .

[10]  Wei-Hao Lin,et al.  News video classification using SVM-based multimodal classifiers and combination strategies , 2002, MULTIMEDIA '02.

[11]  Riichiro Mizoguchi,et al.  Topic recognition for news speech based on keyword spotting , 1998, ICSLP.