论文信息 - Cross-task portability of a broadcast news speech recognition system

Cross-task portability of a broadcast news speech recognition system

This paper reports on experiments of porting the ITC-irst Italian broadcast news recognition system to two spontaneous dialogue domains. Porting was investigated by applying state-of-the-art adaptation methods on acoustic and language models, and by evaluating the trade-off between performance and required amount of task specific annotated data. The use of different levels of supervision for acoustic model adaptation was also studied. By employing 2 h of manually annotated speech, word error rates of 26.0% and 28.4% were achieved by the adapted systems. These results are to be compared with the performance of two domain specific baseline systems, 22.6% and 21.2%, respectively, which were developed on much more training data. Finally, a robust method is presented that allows to tune the insertion of spontaneous speech phenomena by the speech decoder.

Fabio Brugnara | Mauro Cettolo | Marcello Federico | Diego Giuliani | Nicola Bertoldi

[1] Fabio Brugnara,et al. Improvements in tree-based language model representation , 1995, EUROSPEECH.

[2] Mark J. F. Gales,et al. Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[3] Marcello Federico,et al. Development and Evaluation of an Italian Broadcast News Corpus , 2000, LREC.

[4] Fabio Pianesi,et al. A speech-to-speech translation based interface for tourism. , 1999 .

[5] Alex Waibel,et al. Readings in speech recognition , 1990 .

[6] Fabio Brugnara,et al. Advances in automatic transcription of Italian broadcast news , 2000, INTERSPEECH.

[7] J. Cleary,et al. \self-organized Language Modeling for Speech Recognition". In , 1997 .

[8] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[9] David G. Stork,et al. Pattern Classification , 1973 .

[10] C. Kunz,et al. Large-vocabulary speech recognition in specialized domains , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[11] Philip C. Woodland,et al. Flexible speaker adaptation for large vocabulary speech recognition , 1995, EUROSPEECH.

[12] Mauro Cettolo,et al. Automatic recognition of spontaneous speech dialogues , 1998, ICSLP.

[13] Giuliano Antoniol,et al. Language modelling for efficient beam-search , 1995, Comput. Speech Lang..

[14] Philip C. Woodland,et al. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[15] Fabio Brugnara,et al. Dynamic language models for interactive speech applications , 1997, EUROSPEECH.

[16] Sadaoki Furui,et al. Advances in Speech Signal Processing , 1991 .

[17] Fabio Brugnara,et al. From broadcast news to spontaneous dialogue transcription: portability issues , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[18] Jean-Luc Gauvain,et al. Towards task-independent speech recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[19] Hermann Ney,et al. On structuring probabilistic dependences in stochastic language modelling , 1994, Comput. Speech Lang..

[20] Shigeo Abe DrEng. Pattern Classification , 2001, Springer London.