Automatic video annotation through search and mining

Conventional approaches to video annotation predominantly focus on supervised identification of a limited set of concepts, while unsupervised annotation with infinite vocabulary remains unexplored. This work aims to exploit the overlap in content of news video to automatically annotate by mining similar videos that reinforce, filter, and improve the original annotations. The algorithm employs a two-step process of search followed by mining. Given a query video consisting of visual content and speech-recognized transcripts, similar videos are first ranked in a multimodal search. Then, the transcripts associated with these similar videos are mined to extract keywords for the query. We conducted extensive experiments over the TRECVID 2005 corpus and showed the superiority of the proposed approach to using only the mining process on the original video for annotation. This work represents the first attempt at unsupervised automatic video annotation leveraging overlapping video content.

[1]  Meng Wang,et al.  Optimizing multi-graph learning: towards a unified video annotation scheme , 2007, ACM Multimedia.

[2]  Thomas S. Huang,et al.  Automatic Video Annotation by Mining Speech Transcripts , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[3]  R. Manmatha,et al.  Statistical models for automatic video annotation and retrieval , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Meng Wang,et al.  MSRA-USTC-SJTU at TRECVID 2007: High-Level Feature Extraction and Search , 2007, TRECVID.

[5]  Chong-Wah Ngo,et al.  Novelty detection for cross-lingual news stories with visual duplicates and speech transcripts , 2007, ACM Multimedia.

[6]  Tao Mei,et al.  Correlative multi-label video annotation , 2007, ACM Multimedia.

[7]  Wei-Ying Ma,et al.  AnnoSearch: Image Auto-Annotation by Search , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).