Pictorial transcripts: multimedia processing applied to digital library creation

This paper describes a working system for the automated archiving and selective retrieval of textual, pictorial and auditory information contained in video programs. Video processing performs the task of representing the visual information using a small subset of the video frames. Linguistic processing refines the closed caption text, generates table of contents, and creates links to relevant multimedia documents. Audio and video information are compressed and indexed based on their temporal association with the selected video frames and processed text. The derived information is used to automatically generate a hypermedia rendition of the program contents. This provides a compact representation of the information contained in the video program. It also serves as a textual and pictorial index for selective retrieval of the full-motion video program. This fully automatic system generates HyperText Markup Language (HTML) renditions of television programs, and makes them available for access over the Internet within seconds of their broadcast. This digital library currently contains over 2200 hours of television programs.

[1]  Mehryar Mohri,et al.  Weighted determinization and minimization for large vocabulary speech recognition , 1997, EUROSPEECH.

[2]  Juin-Hwey Chen,et al.  Transform predictive coding of wideband speech signals , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[3]  Gerard Salton,et al.  Automatic text decomposition using text segments and text themes , 1996, HYPERTEXT '96.

[4]  Richard Shillcock,et al.  Proceedings of EUROSPEECH-1991. , 1991 .

[5]  David C. Gibbon,et al.  Multi-modal system for locating heads and faces , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[6]  Alexander G. Hauptmann,et al.  Informedia: news-on-demand multimedia information acquisition and retrieval , 1997 .

[7]  Behzad Shahraray,et al.  Scene change detection and content-based sampling of video sequences , 1995, Electronic Imaging.

[8]  Fernando Pereira,et al.  The AT&t 60,000 word speech-to-text system , 1995, EUROSPEECH.

[9]  David C. Gibbon,et al.  Automated authoring of hypermedia documents of video programs , 1995, MULTIMEDIA '95.