论文信息 - Recent advances in broadcast news transcription

Recent advances in broadcast news transcription

Th paper describes recent advances in the CU-HTK Broadcast News English (BN-E) transcription system and its performance in the DARPA/NIST Rich Transcription 2003 Speech-to-Text (RT-03) evaluation. Heteroscedastic linear discriminant analysis (HLDA) and discriminative training, which were previously developed in the context of the recognition of conversational telephone speech, have been successfully applied to the BN-E task for the first time. A number of new features have also been added. These include gender-dependent (GD) discriminative training and modified discriminative training using lattice regeneration and combination. On the 2003 evaluation set, the system gave an overall word error rate of 10.7% in less than 10 times real time (10/spl times/RT).

[1] P. Woodland,et al. Flexible speaker adaptation using maximum likelihood linear regression , 1995 .

[2] Mark J. F. Gales,et al. Mean and variance adaptation within the MLLR framework , 1996, Comput. Speech Lang..

[3] Andreas G. Andreou,et al. Investigation of silicon auditory models and generalization of linear discriminant analysis for improved speech recognition , 1997 .

[4] Mark J. F. Gales,et al. Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[5] Thomas Niesler,et al. Comparison of part-of-speech and automatically derived category-based language models for speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[6] Andreas Stolcke,et al. Finding consensus among words: lattice-based word error minimization , 1999, EUROSPEECH.

[7] Thomas Hain,et al. The CUHTK-entropic 10xRT broadcast news transcription system , 1999 .

[8] Andreas Stolcke,et al. Entropy-based Pruning of Backoff Language Models , 2000, ArXiv.

[9] Gunnar Evermann,et al. Posterior probability decoding, confidence estimation and system combination , 2000 .

[10] Philip C. Woodland,et al. Speaker adaptation using lattice-based MLLR , 2001 .

[11] Daniel Povey,et al. Minimum Phone Error and I-smoothing for improved discriminative training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12] Daniel Povey,et al. Large scale discriminative training of hidden Markov models for speech recognition , 2002, Comput. Speech Lang..

[13] Philip C. Woodland,et al. The development of the HTK Broadcast News transcription system: An overview , 2002, Speech Commun..

[14] David Graff. An overview of Broadcast News corpora , 2002, Speech Commun..

[15] Thomas Hain,et al. IMPLICIT PRONUNCIATION MODELLING IN ASR , 2002 .

[16] Gunnar Evermann,et al. Design of fast LVCSR systems , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[17] Gunnar Evermann,et al. An investigation into the the interactions between speaker diarisation systems and automatic speech transcription , 2003 .

[18] Mark J. F. Gales,et al. Automatic complexity control for HLDA systems , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[19] Mark J. F. Gales,et al. Discriminative map for acoustic model adaptation , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..