Dynamic language modeling for broadcast news

ABSTRACT This paper describes some recent experiments on unsuper-vised language model adaptation for transcription of broadcastnews data. In previous work, a framework for automaticallyselecting adaptation data using information retrieval techniqueswas proposed. This work extends the method and presents ex-perimental results with unsupervised language model adapta-tion. Threeprimaryaspectsareconsidered: (1)theperformanceof5widelyusedLMadaptationmethodsusingthesameadapta-tion data is compared; (2) the influence of the temporal distancebetween the training and test data epoch on the adaptation effi-ciency is assessed; and (3) show-based language model adapta-tion is compared with story-based language model adaptation.Experimentshavebeencarriedoutforbroadcastnewstranscrip-tion in English and Mandarin Chinese. A relative word errorrate reduction of 4.7% was obtained in English and a 5.6% rela-tive character error rate reduction in Mandarin withstory-basedMDI adaptation. 1. INTRODUCTION While n-gram models are successfully used in speech recog-nition, their performance is influenced by any mismatch be-tween thetraining and testdata [7]. Theidea of language model(LM) adaptation is to use a small amount of domain specificdata to adjust the LM to reduce the impact of linguistic differ-ences between the training and testing data. Different schemesforLMadaptation have been proposed, such asthecache modelbased on the observation that a word whichoccurred in a recenttext has a higher probability to be seen again [9]; the triggermodel which uses a trigger word pair to get at semantic infor-mation [10]; and structured LMs [1].Broadcast news (BN) transcription is a complicated task forboth acoustic and language modeling. The linguistic attributesof BN data are complex, arising from the many different speak-ing styles, from spontaneous conversation to prepared speech(close in style to written texts). The content of BN data is openand any given BN show covers multiple topics.As a consequence, it is difficult to predict the topics of a BNshow without looking at the data itself. The only informationthat is available for the show are the hypotheses output from thespeech recognizer. However, for any given broadcast, the num-ber of words in the hypothesized transcript is quite small andcontains recognition errors. Therefore the transcripts are notsufficient for use as an adaptive corpus. Information retrieval(IR) methods provide a means to address this problem. Insteadof directly using the ASR hypotheses for LM adaptation, theycan be used as queries to an IR system in order to select ad-ditional on-topic adaptation data from a large general corpus.This approach reduces the effect of transcription errors in thehypotheses and at the same time provides substantially moretextual data for LM estimation.In this paper, a series of experiments are presented exploringthe general framework of unsupervised LM adaptation using IRmethods [3]. The performances of a variety of popular tech-niques for LM adaptation using automatically selected adapta-tion data are compared. The investigated techniques are linearinterpolation,maximum aposteriori(MAP)adaptation, mixturemodels, dynamic mixture models, and minimum discriminationinformation (MDI) adaptation. The effect of the temporal dis-tance between the epoch of the adaptation corpus and of theepoch of the test data is also assessed. As mentioned above, agiven BN show typically covers several stories, with each storybeing related to a different topic. To address the changing prop-erty of BN data, static and dynamic models for LM adaptationare investigated. In static modeling the LM is updated oncefor the whole show, which means that the LM must be simul-taneously fit to multiple topics. Dynamic modeling updates theLM at each automatically detected story change, which entailsestimating multiplestory-based LMs for each BN show. Exper-iments arecarriedout forBNtranscriptioninAmericanEnglishand Mandarin Chinese.

[1]  Jean-Luc Gauvain,et al.  Unsupervised language model adaptation for broadcast news , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[2]  Jean-Luc Gauvain,et al.  Broadcast news transcription in Mandarin , 2000, INTERSPEECH.

[3]  Reinhard Kneser,et al.  On the dynamic adaptation of stochastic language models , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Mari Ostendorf,et al.  Relevance weighting for combining multi-domain data for n-gram language modeling , 1999, Comput. Speech Lang..

[5]  Frederick Jelinek,et al.  Structured language modeling , 2000, Comput. Speech Lang..

[6]  Ronald Rosenfeld,et al.  A maximum entropy approach to adaptive statistical language modelling , 1996, Comput. Speech Lang..

[7]  Marcello Federico,et al.  Bayesian estimation methods for n-gram language model adaptation , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[8]  Marcello Federico,et al.  Efficient language model adaptation through MDI estimation , 1999, EUROSPEECH.

[9]  E. W. D. Whittaker Temporal Adaptation of Language Models , 2004 .

[10]  Jean-Luc Gauvain,et al.  The LIMSI Broadcast News transcription system , 2002, Speech Commun..

[11]  Renato De Mori,et al.  A Cache-Based Natural Language Model for Speech Recognition , 1990, IEEE Trans. Pattern Anal. Mach. Intell..