Approaches to reduce the effects of OOV queries on indexed spoken audio

We present several novel approaches to the Out of Vocabulary (OOV) query problem for spoken audio: indexing based on syllable-like units called particles and query expansion according to acoustic confusability for a word index. We also examine linear and OOV-based combination of indexing schemes. We experiment on 75 h of broadcast news, comparing our techniques to a word index, a phoneme index and a phoneme index queried with phoneme sequences. Our results show that our approaches are superior to both a word index and a phoneme index for OOV words, and have comparable performance to the sequence of phonemes scheme. The particle system has worse performance than the acoustic query expansion scheme. The best system uses word queries for in-vocabulary words and a linear combination of the phoneme sequence scheme and acoustic query expansion for OOV words. Using the best possible weights for linear combination, this system improves the average precision from 0.35 for a word index to 0.40, a result only obtainable if the weights could be learnt on a development query set. The next best system used a word index for in-vocabulary words and the phoneme sequence system otherwise and had average precision of 0.39.

[1]  Karen Spärck Jones,et al.  Effects of out of vocabulary words in spoken document retrieval (poster session) , 2000, SIGIR '00.

[2]  Steve Renals,et al.  Retrieval of broadcast news documents with the THISL system , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[3]  Alan W. Black,et al.  Letter to sound rules for accented lexicon compression , 1998, ICSLP.

[4]  Qian Huang,et al.  Multimedia Search and Retrieval , 1999 .

[5]  David A. James,et al.  A system for unrestricted topic retrieval from radio news broadcasts , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[6]  Dragutin Petkovic,et al.  Phonetic confusion matrix based spoken document retrieval , 2000, SIGIR '00.

[7]  Treebank Penn,et al.  Linguistic Data Consortium , 1999 .

[8]  Karen Spärck Jones,et al.  Retrieving spoken documents by combining multiple index sources , 1996, SIGIR '96.

[9]  Salim Roukos,et al.  A multistage algorithm for spotting new words in speech , 2002, IEEE Trans. Speech Audio Process..

[10]  Kenney Ng Towards robust methods for spoken document retrieval , 1998, ICSLP.

[11]  Garrison W. Cottrell,et al.  Fusion Via a Linear Combination of Scores , 1999, Information Retrieval.

[12]  Michael J. Witbrock,et al.  Informedia News-On Demand: Using Speech Recognition to Create a Digital Video Library , 1998 .

[13]  Ellen M. Voorhees,et al.  Evaluating Evaluation Measure Stability , 2000, SIGIR 2000.

[14]  Michael J. Swain,et al.  SpeechBot: a Speech Recognition based Audio Indexing System for the Web , 2000, RIAO.

[15]  E.W.D. Whittaker,et al.  Vocabulary independent speech recognition using particles , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[16]  Beth Logan,et al.  An experimental study of an audio indexing system for the web , 2000, INTERSPEECH.

[17]  Mark A. Clements,et al.  Phonetic Searching vs. LVCSR: How to Find What You Really Want in Audio Archives , 2002, Int. J. Speech Technol..

[18]  Alexander H. Waibel,et al.  Reducing the OOV rate in broadcast news speech recognition , 1998, ICSLP.

[19]  Kenney Ng Information fusion for spoken document retrieval , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[20]  Howard D. Wactlar,et al.  INFORMEDIATM: NEWS-ON-DEMAND EXPERIMENTS IN SPEECH RECOGNITION , 1998 .

[21]  Michael J. Witbrock,et al.  Using words and phonetic strings for efficient information retrieval from imperfectly transcribed spoken documents , 1997, DL '97.