Recent advances in broadcast news transcription

Th paper describes recent advances in the CU-HTK Broadcast News English (BN-E) transcription system and its performance in the DARPA/NIST Rich Transcription 2003 Speech-to-Text (RT-03) evaluation. Heteroscedastic linear discriminant analysis (HLDA) and discriminative training, which were previously developed in the context of the recognition of conversational telephone speech, have been successfully applied to the BN-E task for the first time. A number of new features have also been added. These include gender-dependent (GD) discriminative training and modified discriminative training using lattice regeneration and combination. On the 2003 evaluation set, the system gave an overall word error rate of 10.7% in less than 10 times real time (10/spl times/RT).

[1]  P. Woodland,et al.  Flexible speaker adaptation using maximum likelihood linear regression , 1995 .

[2]  Mark J. F. Gales,et al.  Mean and variance adaptation within the MLLR framework , 1996, Comput. Speech Lang..

[3]  Andreas G. Andreou,et al.  Investigation of silicon auditory models and generalization of linear discriminant analysis for improved speech recognition , 1997 .

[4]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[5]  Thomas Niesler,et al.  Comparison of part-of-speech and automatically derived category-based language models for speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[6]  Andreas Stolcke,et al.  Finding consensus among words: lattice-based word error minimization , 1999, EUROSPEECH.

[7]  Thomas Hain,et al.  The CUHTK-entropic 10xRT broadcast news transcription system , 1999 .

[8]  Andreas Stolcke,et al.  Entropy-based Pruning of Backoff Language Models , 2000, ArXiv.

[9]  Gunnar Evermann,et al.  Posterior probability decoding, confidence estimation and system combination , 2000 .

[10]  Philip C. Woodland,et al.  Speaker adaptation using lattice-based MLLR , 2001 .

[11]  Daniel Povey,et al.  Minimum Phone Error and I-smoothing for improved discriminative training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Daniel Povey,et al.  Large scale discriminative training of hidden Markov models for speech recognition , 2002, Comput. Speech Lang..

[13]  Philip C. Woodland,et al.  The development of the HTK Broadcast News transcription system: An overview , 2002, Speech Commun..

[14]  David Graff An overview of Broadcast News corpora , 2002, Speech Commun..

[15]  Thomas Hain,et al.  IMPLICIT PRONUNCIATION MODELLING IN ASR , 2002 .

[16]  Gunnar Evermann,et al.  Design of fast LVCSR systems , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[17]  Gunnar Evermann,et al.  An investigation into the the interactions between speaker diarisation systems and automatic speech transcription , 2003 .

[18]  Mark J. F. Gales,et al.  Automatic complexity control for HLDA systems , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[19]  Mark J. F. Gales,et al.  Discriminative map for acoustic model adaptation , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..