Unsupervised Language and Acoustic Model Adaptation for Cross Domain Portability

This work investigates the task of porting a broadcast news recognition system to a conversational speech domain, for which only untranscribed acoustic data are available. An iterative adaptation procedure is proposed that alternatively generates automatic speech transcriptions and performs acoustic and language model adaptation. The procedure was applied on a tourist-information conversational domain, for which 8 hours of audio data were available for development and 2 hours for testing. On the test set, the broadcast news system yields a word-error-rate of 51.0% while a task specific system achieves a word-error-rate of 21.2%. Unsupervised porting experiments allowed to reduce the gap between the two reference systems by 61%.

[1]  Marcello Federico,et al.  Development and Evaluation of an Italian Broadcast News Corpus , 2000, LREC.

[2]  George Zavaliagkos,et al.  Using untranscribed training data to improve performance , 1998, ICSLP.

[3]  Alexander H. Waibel,et al.  Unsupervised training of a speech recognizer using TV broadcasts , 1998, ICSLP.

[4]  Dietrich Klakow,et al.  Language model adaptation using dynamic marginals , 1997, EUROSPEECH.

[5]  Jean-Luc Gauvain,et al.  Investigating lightly supervised acoustic model training , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[6]  Marcello Federico,et al.  Efficient language model adaptation through MDI estimation , 1999, EUROSPEECH.

[7]  Hermann Ney,et al.  On structuring probabilistic dependences in stochastic language modelling , 1994, Comput. Speech Lang..

[8]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[9]  Fabio Brugnara,et al.  From broadcast news to spontaneous dialogue transcription: portability issues , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[10]  Fabio Brugnara,et al.  A baseline for the transcription of Italian broadcast news , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[11]  Fabio Brugnara,et al.  Advances in automatic transcription of Italian broadcast news , 2000, INTERSPEECH.

[12]  Philip C. Woodland,et al.  Flexible speaker adaptation for large vocabulary speech recognition , 1995, EUROSPEECH.

[13]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..