Multi-Modal Person-Profiles from Broadcast News Video

The need to analyze and index large amounts of video information is becoming more important as the way people consume media continues to change. In recent years, the push to attack multimedia indexing and retrieval applications in a holistic, multi-modal way has garnered great attention. In this work we propose the holistic use of both audio, visual and textual information for the automatic indexing of broadcast news video to create person-profiles. Indexing videos in this matter helps facilitate a unique way to create multimedia person databases automatically, as well as for attacking existing video analysis tasks. We test our algorithm on news data from NBC and present areas for future exploration.

[1]  Marcel Worring,et al.  Multimodal Video Indexing : A Review of the State-ofthe-art , 2001 .

[2]  David A. Forsyth,et al.  Words and Pictures in the News , 2003, HLT-NAACL 2003.

[3]  Michael G. Christel,et al.  Exploiting multiple modalities for interactive video retrieval , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Yee Whye Teh,et al.  Names and faces in the news , 2004, CVPR 2004.

[5]  Rong Jin,et al.  Learning to identify video shots with people based on face detection , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[6]  Jun Yang,et al.  Multi-modal analysis for person type classification in news video , 2005, IS&T/SPIE Electronic Imaging.

[7]  David A. Forsyth,et al.  Towards auto-documentary: tracking the evolution of news stories , 2004, MULTIMEDIA '04.

[8]  Howard D. Wactlar,et al.  Linking visual and textual data on video , 2003 .