论文信息 - Complementary video and audio analysis for broadcast news archives

Complementary video and audio analysis for broadcast news archives

Abstract The Informedia Digital Video Library system extracts information from digitized video sources and allows full content search and retrieval over all extracted data. This extracted 'metadata' enables users to rapidly find interesting news stories and to quickly identify whether a retrieved TV news story is indeed relevant to their query. This article highlights two unique features: named faces and location analysis . Named faces automatically associate a name with a face, while location analysis allows the user to visually follow the action in the news story on a map and also allows queries for news stories by graphically selecting a region on the map. 1 The Informedia Digital Video Library Project The Informedia Digital Video Library project [1], initiated in 1994, uniquely utilizes integrated speech, image and natural language understanding to process broadcast video. The project’s goal is to allow search and retrieval in the video medium, similar to what is available today for text only. To enable this access to video, fast, high-accuracy automatic transcriptions of broadcast news stories are generated through Carnegie Mellon’s Sphinx speech recognition system and closed captions are incorporated where available. Image processing determines scene boundaries, recognizes faces and allows for image similarity comparisons. Text visible on the screen is recognized through video OCR and can be searched. Everything is indexed into a searchable digital video library [2], where users can ask queries and retrieve relevant news stories as results. The

Howard D. Wactlar | Alexander G. Hauptmann | Michael G. Christel | Ricky Houghton | Andreas M. Olligschlaeger

[1] S. Chiba,et al. Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[2] Alexander G. Hauptmann,et al. Learning to Recognize Speech by Watching Television , 1999, IEEE Intell. Syst..

[3] Angela Lee,et al. Perspectives on … Environmental Systems Research Institute, Inc , 1997 .

[4] Yihong Gong,et al. Lessons Learned from Building a Terabyte Digital Video Library , 1999, Computer.

[5] Ricky Houghton. Named Faces: Putting Names to Faces , 1999, IEEE Intell. Syst..

[6] Ralph Weischedel,et al. NAMED ENTITY EXTRACTION FROM SPEECH , 1998 .

[7] Takeo Kanade,et al. Human Face Detection in Visual Scenes , 1995, NIPS.

[8] Stanley F. Chen,et al. Language and Pronunciation Modeling in the CMU 1996 Hub 4 Evaluation , 1999 .

[9] Ellen K. Hughes,et al. Video OCR for digital news archive , 1998, Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database.

[10] Alexander G. Hauptmann,et al. Informedia: news-on-demand multimedia information acquisition and retrieval , 1997 .

[11] Steve Young,et al. Spoken language systems technology workshop , 1995 .