Cross-task portability of a broadcast news speech recognition system

This paper reports on experiments of porting the ITC-irst Italian broadcast news recognition system to two spontaneous dialogue domains. Porting was investigated by applying state-of-the-art adaptation methods on acoustic and language models, and by evaluating the trade-off between performance and required amount of task specific annotated data. The use of different levels of supervision for acoustic model adaptation was also studied. By employing 2 h of manually annotated speech, word error rates of 26.0% and 28.4% were achieved by the adapted systems. These results are to be compared with the performance of two domain specific baseline systems, 22.6% and 21.2%, respectively, which were developed on much more training data. Finally, a robust method is presented that allows to tune the insertion of spontaneous speech phenomena by the speech decoder.

[1]  Fabio Brugnara,et al.  Improvements in tree-based language model representation , 1995, EUROSPEECH.

[2]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[3]  Marcello Federico,et al.  Development and Evaluation of an Italian Broadcast News Corpus , 2000, LREC.

[4]  Fabio Pianesi,et al.  A speech-to-speech translation based interface for tourism. , 1999 .

[5]  Alex Waibel,et al.  Readings in speech recognition , 1990 .

[6]  Fabio Brugnara,et al.  Advances in automatic transcription of Italian broadcast news , 2000, INTERSPEECH.

[7]  J. Cleary,et al.  \self-organized Language Modeling for Speech Recognition". In , 1997 .

[8]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[9]  David G. Stork,et al.  Pattern Classification , 1973 .

[10]  C. Kunz,et al.  Large-vocabulary speech recognition in specialized domains , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[11]  Philip C. Woodland,et al.  Flexible speaker adaptation for large vocabulary speech recognition , 1995, EUROSPEECH.

[12]  Mauro Cettolo,et al.  Automatic recognition of spontaneous speech dialogues , 1998, ICSLP.

[13]  Giuliano Antoniol,et al.  Language modelling for efficient beam-search , 1995, Comput. Speech Lang..

[14]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[15]  Fabio Brugnara,et al.  Dynamic language models for interactive speech applications , 1997, EUROSPEECH.

[16]  Sadaoki Furui,et al.  Advances in Speech Signal Processing , 1991 .

[17]  Fabio Brugnara,et al.  From broadcast news to spontaneous dialogue transcription: portability issues , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[18]  Jean-Luc Gauvain,et al.  Towards task-independent speech recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[19]  Hermann Ney,et al.  On structuring probabilistic dependences in stochastic language modelling , 1994, Comput. Speech Lang..

[20]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.