IBM Research Report Learning Speech-Based Video Concept Models Using WordNet

Modeling concepts using supervised or unsupervised machine learning approaches are becoming more and more important for video semantic indexing, retrieval and filtering applications. Naturally, videos include multimodality audio, speech, visual and text data, that are combined to inferred therein the overall semantic concepts. However, in literature, most researches were mostly conducted within only one single domain. In this paper we propose an unsupervised technique that builds context-independent keyword lists for desired speech-based concept modeling from WordNet. Furthermore, we propose an extended speech-based video concept (ESVC) model to reorder and extend the above keyword lists by supervised learning based on multimodality annotation. Experimental results show that the context-independent models can achieve comparable performance to conventional supervised learning algorithms, and the ESVC model achieves about 53% and 28.4% relative improvement in two testing subsets of the TRECVID 2003 corpus over a prior state-of-the-art speech-based video concept detection algorithm.

[1]  John R. Smith,et al.  User-trainable video annotation using multimodal cues , 2003, SIGIR '03.

[2]  Shih-Fu Chang,et al.  Image classification using multimedia knowledge networks , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[3]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[4]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[5]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[6]  Ching-Yung Lin,et al.  Cross-Modality Automatic Face Model Training from Large Video Databases , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[7]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[8]  James Ze Wang,et al.  Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Ching-Yung Lin,et al.  Autonomous learning of visual concept models , 2005, 2005 IEEE International Symposium on Circuits and Systems.

[10]  Ching-Yung Lin,et al.  Autonomous visual model building based on image crawling through internet search engines , 2004, MIR '04.