Unsupervised language model adaptation methods for spontaneous speech

In this paper we examine the performance of three different unsupervised language model adaptation schemes applied to speech recognition of spontaneous speech lecture presentations. Two of the schemes have been described previously in the literature while the third is a variation of one of the other two schemes. All three schemes are based on a combination of word -gram and class -gram models and use an initial transcription hypothesis to adapt the parameters of the class model. In each case the adapted class model is linearly interpolated with the baseline word -gram model and the combination is then applied in a subsequent recognition step. One of the schemes described also contains an element of domain adaptation in which the transcription hypothesis is also used to determine the interpolation weights of several class models each of which is built on automatically derived clusters of presentations. We also investigate multi-pass adaptation for each scheme and show this gives additional improvements in performance. Relative improvements in word error rate of up to 12.5% (3.4% absolute) are obtained on a held-out test set with the best adaptation scheme.

[1]  Hermann Ney,et al.  Improved clustering techniques for class-based statistical language modelling , 1993, EUROSPEECH.

[2]  T. Shinozaki,et al.  UNSUPERVISED LANGUAGE MODEL ADAPTATION USING WORD CLASSES FOR SPONTANEOUS SPEECH RECOGNITION , 2003 .

[3]  Hitoshi Isahara,et al.  Morphological Analysis of a Large Spontaneous Speech Corpus in Japanese , 2003, ACL.

[4]  Mari Ostendorf,et al.  Modeling long distance dependence in language: topic mixtures versus dynamic cache models , 1996, IEEE Trans. Speech Audio Process..

[5]  Tatsuya Kawahara Benchmark test for speech recognition using the Corpus of Spontaneous Japanese , 2003 .

[6]  Ronald Rosenfeld,et al.  Statistical language modeling using the CMU-cambridge toolkit , 1997, EUROSPEECH.

[7]  Joshua Goodman,et al.  A bit of progress in language modeling , 2001, Comput. Speech Lang..

[8]  Steve J. Young,et al.  Class-based language model adaptation using mixtures of word-class weights , 2000, INTERSPEECH.

[9]  S. Furui,et al.  Word-class models for unsupervised language model adaptation applied to spontaneous speech recognition ∗ dg , 2004 .

[10]  Thomas Niesler,et al.  Unsupervised language model adaptation for lecture speech transcription , 2002, INTERSPEECH.

[11]  Kiyohiro Shikano,et al.  Julius - an open source real-time large vocabulary recognition engine , 2001, INTERSPEECH.

[12]  Sadaoki Furui,et al.  Analysis on individual differences in automatic transcription of spontaneous presentations , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  S. Furui,et al.  Looking at alternatives within the framework of n-gram based language modeling for spontaneous speech recognition , 2004 .

[14]  Hitoshi Isahara,et al.  Spontaneous Speech Corpus of Japanese , 2000, LREC.

[15]  Satoshi Sekine,et al.  Automatic Sublanguage Identification for a New Text , 1994 .

[16]  Brian Roark,et al.  Unsupervised language model adaptation , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[17]  Sadaoki Furui,et al.  Unsupervised class-based language model adaptation for spontaneous speech recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..