Using information retrieval methods for language model adaptation

In this paper we report experiments on language model adaptation using information retrieval methods, drawing upon recent developments in information extraction and topic tracking. One of the problems is extracting reliable topic information with high confidence from the audio signal in the presence of recognition errors. The work in the information retrieval domain on information extraction and topic tracking suggested a new way to solve this problem. In this work, we make use of information retrieval methods to extract topic information in the word recognizer hypotheses, which are then used to automatically select adaptation data from a very large general text corpus. Two adaptive language models, a mixture based model and a MAP based model, have been investigated using the adaptation data. Experiments carried out with the LIMSI Mandarin broadcast news transcription system gives a relative character error rate reduction of 4.3% with this adaptation method.

[1]  Jochen Peters,et al.  Semantic clustering for adaptive language modeling , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Reinhard Kneser,et al.  On the dynamic adaptation of stochastic language models , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Mari Ostendorf,et al.  Relevance weighting for combining multi-domain data for n-gram language modeling , 1999, Comput. Speech Lang..

[4]  Philip Clarkson,et al.  The applicability of adaptive language modelling for the broadcast news task , 1998, ICSLP.

[5]  Marcello Federico,et al.  Bayesian estimation methods for n-gram language model adaptation , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[6]  Renato De Mori,et al.  A Cache-Based Natural Language Model for Speech Recognition , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Douglas E. Appelt,et al.  NAMED ENTITY EXTRACTION FROM SPEECH: APPROACH AND RESULTS USING THE TEXTPRO SYSTEM , 1999 .

[8]  Lori Lamel,et al.  The LIMSI 1998 Hub-4E Transcription System , 1997 .

[9]  J. M. Schultz,et al.  Topic Detection and Tracking using idf-Weighted Cosine Coefficient , 1999 .

[10]  Jean-Luc Gauvain,et al.  Broadcast news transcription in Mandarin , 2000, INTERSPEECH.

[11]  Thomas Niesler,et al.  Modelling word-pair relations in a category-based language model , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.