论文信息 - Text, Speech, and Vision for Video Segmentation: The InformediaTM Project

Text, Speech, and Vision for Video Segmentation: The InformediaTM Project

We describe three technologies involved in creating a digital video library suitable for fullcontent search and retrieval. Image processing analyzes scenes, speech processing transcribes the audio signal, and natural language processing determines word relevance. The integration of these technologies enables us to include vast amounts of video data in the library.

Alexander G. Hauptmann | Michael H. Smith | Alexander Hauptmann | Michael Smith

[1] Takeo Kanade,et al. Informedia Digital Video Library , 1995, CACM.

[2] Michael McGill,et al. Introduction to Modern Information Retrieval , 1983 .

[3] Alexander I. Rudnicky. Language Modeling with Limited Domain Data , 1995 .

[4] Yoshinobu Tonomura,et al. Video tomography: an efficient method for camerawork extraction and motion analysis , 1994, MULTIMEDIA '94.

[5] Michael Loren Mauldin,et al. Information retrieval by text skimming , 1989 .

[6] M. Smith,et al. Video Skimming for Quick Browsing based on Audio and Image Characterization , 1995 .

[7] Mei-Yuh Hwang,et al. Improving speech recognition performance via phone-dependent VQ codebooks and adaptive language models in SPHINX-II , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[8] Howard D. Wactlar,et al. Informedia: improving access to digital video , 1994, INTR.