Automatic Language Model Adaptation for Spoken Document Retrieval

This paper describes experiments implemented at NIST in adapting language models over time to improve recognition of broadcast news recorded over many months. These experiments were designed specifically to improve the utility of automatically generated transcripts for retrieval applications. To evaluate the potential of the approach, a time-adaptive automatic speech recognition run was implemented to support the 1999 TREC Spoken Document Retrieval (SDR) Track - more than 500 hours of broadcast news sampled across 5 months. The accuracy of retrieval for several systems using the time-adaptive system transcripts was evaluated against transcripts produced by virtually the same recognition system with a fixed language model. This paper details the process we employed to identify and implement the time-adaptive language model and discusses the results of the experiment in terms of its effect on word error rate, out of vocabulary rate and retrieval accuracy (Mean Average Precision).

[1]  Ellen M. Voorhees,et al.  1998 TREC-7 Spoken Document Retrieval Track Overview and Results , 1998 .

[2]  Jonathan G. Fiscus,et al.  1998 Broadcast News Benchmark Test Results: English and Non-English Word Error Rate Performance Measures , 1998 .

[3]  Donna K. Harman,et al.  Overview of the Eighth Text REtrieval Conference (TREC-8) , 1999, TREC.

[4]  Thomas Hain,et al.  The CUHTK-entropic 10xRT broadcast news transcription system , 1999 .

[5]  John Makhoul,et al.  The 1998 BBN Byblos 10 x Real Time System , 1999 .

[6]  Renato De Mori,et al.  A Cache-Based Natural Language Model for Speech Recognition , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Ellen M. Voorhees,et al.  The TREC Spoken Document Retrieval Track: A Success Story , 2000, TREC.

[8]  Amit Srivastava,et al.  Integrated technologies for indexing spoken language , 2000, CACM.

[9]  Puming Zhan,et al.  Dragon systems' 1998 broadcast news transcription system , 1999, EUROSPEECH.

[10]  Karen Sparck Jones,et al.  Spoken Document Retrieval for TREC-8 at Cambridge University , 1998, TREC.

[11]  Renato De Mori,et al.  Corrections to "A Cache-Based Language Model for Speech Recognition" , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Jean-Luc Gauvain,et al.  The LIMSI SDR System for TREC-8 , 1999, TREC.

[13]  Steve Renals,et al.  The THISL SDR System At TREC-8 , 1999, TREC.

[14]  Mark Liberman,et al.  THE TDT-2 TEXT AND SPEECH CORPUS , 1999 .

[15]  William M. Fisher A statistical text-to-phone function using ngrams and rules , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[16]  Jonathan G. Fiscus,et al.  NIST's 1998 topic detection and tracking evaluation (TDT2) , 1999, EUROSPEECH.

[17]  Ramana Rao,et al.  SRI’s 1998 Broadcast News System – Toward Faster, Better, Smaller Speech Recognition , 1999 .

[18]  Anthony J. Robinson,et al.  Language model adaptation using mixtures and an exponentially decaying cache , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.