论文信息 - Dragon systems' 1998 broadcast news transcription system

Dragon systems' 1998 broadcast news transcription system

In this paper we shall describe key improvements to Dragon’s Broadcast News Transcription System, which include: the addition of a speaker-change detection algorithm to our preprocessing subsystem, a new diagonalizing transformation trained using semi-tied covariances, and the addition of probabilities on pronunciations. This new transcription system yields a word error rate of 15.2% on the 1997 evaluation test data, and 14.5% őn the 1998 evaluation test data.

[1] M. J. Hunt,et al. An investigation of PLP and IMELDA acoustic representations and of their potential for combination , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[2] Puming Zhan,et al. Progress in Broadcast News transcription at Dragon Systems , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[3] Don McAllaster,et al. Improvements in recognition of conversational telephone speech , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[4] Mark J. F. Gales. Semi-tied covariance matrices , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[5] Jonathan G. Fiscus,et al. A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[6] Thomas Hain,et al. The 1997 HTK broadcast news transcription system , 1998 .

[7] S. Wegmann,et al. Speaker normalization on conversational telephone speech , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[8] H Hermansky,et al. Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[9] R. Gopinath. CONSTRAINED MAXIMUM LIKELIHOOD MODELING WITH GAUSSIAN DISTRIBUTIONS , 2001 .

[10] Richard M. Schwartz,et al. A compact model for speaker-adaptive training , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[11] Larry Gillick,et al. Studies in transformation-based adaptation , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12] Mark J. F. Gales,et al. Broadcast news transcription using HTK , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13] Richard M. Schwartz,et al. The 1996 BBN BYBLOS HUB-4 Transcription System , 1996 .

[14] Philip C. Woodland,et al. Speaker adaptation of continuous density HMMs using multivariate linear regression , 1994, ICSLP.

[15] S. Wegmann,et al. DRAGON SYSTEMS ’ 1997 MANDARIN BROADCAST NEWS SYSTEM , 1997 .

[16] Larry Gillick,et al. Progress in recognizing conversational telephone speech , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.