Open-vocabulary speech indexing for voice and video mail retrieval

This paper presents recent work on a multimedia retrieval project at Cambridge University and Olivetti Research Limited (ORL). We present novel techniques that allow extremely rapid audio indexing, at rates approaching several thousand times real time. Unlike other methods, these techniques do not depend on a fixed vocabulary recognition system or on keywords that must be known well in advance. Using statistical methods developed for text, these indexing techniques allow rapid and efficient retrieval and browsing of audio and video documents. This paper presents the project background, the indexing and retrieval techniques, and a video mail retrieval application incorporating content-based audio indexing, retrieval, and browsing.

[1]  Karen Spärck Jones,et al.  Video mail retrieval: the effect of word spotting accuracy on precision , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[2]  Steve J. Young,et al.  A fast lattice-based approach to vocabulary independent wordspotting , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Karen Spärck Jones,et al.  Automatic content-based retrieval of broadcast news , 1995, MULTIMEDIA '95.

[4]  Michael J. Carey,et al.  Topic discrimination using higher-order statistical models of spotted keywords , 1995, Comput. Speech Lang..

[5]  Marti A. Hearst TileBars: visualization of term distribution information in full text information access , 1995, CHI '95.

[6]  Steve J. Young,et al.  HMM-based architecture for face identification , 1994, Image Vis. Comput..

[7]  Kenneth Ward Church,et al.  Morphology and rhyming: two powerful alternatives to letter-to-sound rules for speech synthesis , 1990, SSW.

[8]  Trumpington Street,et al.  A FAST LATTICE-BASED APPROACH TO VOCABULARY INDEPENDENT WORDSPOTTING , 1994 .

[9]  David Anthony James,et al.  The Application of Classical Informa - tion Retrieval Techniques to Spoken Documents , 1995 .

[10]  S. J. Young,et al.  Tree-based state tying for high accuracy acoustic modelling , 1994 .

[11]  K. Sparck Jones,et al.  Spoken document retrieval - a multimedia tool , 1995 .

[12]  Karen Spärck Jones,et al.  Talker-independent keyword spotting for information retrieval , 1995, EUROSPEECH.

[13]  Karen Spärck Jones,et al.  Robust talker-independent audio document retrieval , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[14]  David Salesin,et al.  Fast multiresolution image querying , 1995, SIGGRAPH.

[15]  Stephen W. Smoliar,et al.  Content based video indexing and retrieval , 1994, IEEE MultiMedia.

[16]  Peter Schäuble,et al.  Speech Retrieval Based on Automatic Indexing , 1995, MIRO.

[17]  Karen Spärck Jones,et al.  Retrieving spoken documents by combining multiple index sources , 1996, SIGIR '96.

[18]  Francine Chen,et al.  Segmentation of speech using speaker identification , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[20]  Steve Young,et al.  Token passing: a simple conceptual model for connected speech recognition systems , 1989 .

[21]  K. Wakimoto,et al.  Efficient and Effective Querying by Image Content , 1994 .

[22]  M. A. Bush,et al.  Training and search algorithms for an interactive wordspotting system , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[23]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[24]  Steve Renals,et al.  IPA: improved phone modelling with recurrent neural networks , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[25]  J. Foote,et al.  WSJCAM0: A BRITISH ENGLISH SPEECH CORPUS FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION , 1995 .

[26]  Steve Renals,et al.  WSJCAMO: a British English speech corpus for large vocabulary continuous speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[27]  Xrrox Pakc SEGMENTATION OF SPEECH USING SPEAKER IDENTIFICATION , 1994 .

[28]  Re. Techniques for Information Retrieval from Speech Messages , 1991 .

[29]  Karen Spärck Jones,et al.  Experiments in Spoken Document Retrieval , 1996, Inf. Process. Manag..

[30]  Michael G. Christel,et al.  Automating the creation of a digital video library , 1995, MULTIMEDIA '95.

[31]  D.L. Tennenhouse,et al.  ATM everywhere? , 1993, IEEE Network.

[32]  Herbert Gish,et al.  Approaches to topic identification on the switchboard corpus , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.