Speech recognition for a digital video library

The standard method for making the full content of audio and video material searchable is to annotate it with human-generated meta-data that describes the content in a way that the search can understand, as is done in the creation of multimedia CD-ROMs. However, for the huge amounts of data that could usefully be included in digital video and audio libraries, the cost of producing this meta-data is prohibitive. In the Informedia Digital Video Library, the production of the meta-data supporting the library interface is automated using techniques derived from artificial intelligence (AI) research. By applying speech recognition together with natural language processing, information retrieval, and image analysis, an interface has been produced that helps users locate the information they want, and navigate or browse the digital video library more effectively. Specific interface components include automatic titles, filmstrips, video skims, word location marking, and representative frames for shots. Both the user interface and the information retrieval engine within Informedia are designed for use with automatically derived meta-data, much of which depends on speech recognition for its production. Some experimental information retrieval results will be given, supporting a basic premise of the Informedia project: That speech recognition generated transcripts can make multimedia material searchable. The Informedia project emphasizes the integration of speech recognition, image processing, natural language processing, and information retrieval to compensate for deficiencies in these individual technologies. © 1998 John Wiley & Sons, Inc.

[1]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[2]  Gerard Salton,et al.  The SMART Retrieval System , 1971 .

[3]  Takeo Kanade,et al.  Intelligent Access to Digital Video: Informedia Project , 1996, Computer.

[4]  Andrew Merlino,et al.  Segmentation, Content Extraction and Visualization of Broadcast News Video using Multistream Analysis , 1997 .

[5]  Alexander G. Hauptmann,et al.  Informedia: news-on-demand multimedia information acquisition and retrieval , 1997 .

[6]  Takeo Kanade,et al.  Informedia Digital Video Library , 1995, CACM.

[7]  Alexander I. Rudnicky,et al.  Speech for multimedia information retrieval , 1995, UIST '95.

[8]  David A. James,et al.  A system for unrestricted topic retrieval from radio news broadcasts , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[9]  Karen Spärck Jones,et al.  Retrieving spoken documents by combining multiple index sources , 1996, SIGIR '96.

[10]  Scott Stevens,et al.  Informedia digital video library , 1994, MULTIMEDIA '94.

[11]  Alexander I. Rudnicky Language Modeling with Limited Domain Data , 1995 .

[12]  Michael Stonebraker,et al.  Chabot: Retrieval from a Relational Database of Images , 1995, Computer.

[13]  John M. Gauch,et al.  Vision: a digital video library , 1996, DL '96.

[14]  Mei-Yuh Hwang,et al.  Improving speech recognition performance via phone-dependent VQ codebooks and adaptive language models in SPHINX-II , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[16]  Karen Spärck Jones,et al.  Automatic content-based retrieval of broadcast news , 1995, MULTIMEDIA '95.

[17]  Mark T. Maybury,et al.  Towards content-based browsing of broadcast news video , 1997 .