Dragon systems' 1998 broadcast news transcription system

In this paper we shall describe key improvements to Dragon’s Broadcast News Transcription System, which include: the addition of a speaker-change detection algorithm to our preprocessing subsystem, a new diagonalizing transformation trained using semi-tied covariances, and the addition of probabilities on pronunciations. This new transcription system yields a word error rate of 15.2% on the 1997 evaluation test data, and 14.5% őn the 1998 evaluation test data.

[1]  M. J. Hunt,et al.  An investigation of PLP and IMELDA acoustic representations and of their potential for combination , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[2]  Puming Zhan,et al.  Progress in Broadcast News transcription at Dragon Systems , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[3]  Don McAllaster,et al.  Improvements in recognition of conversational telephone speech , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[4]  Mark J. F. Gales Semi-tied covariance matrices , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[5]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[6]  Thomas Hain,et al.  The 1997 HTK broadcast news transcription system , 1998 .

[7]  S. Wegmann,et al.  Speaker normalization on conversational telephone speech , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[8]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[9]  R. Gopinath CONSTRAINED MAXIMUM LIKELIHOOD MODELING WITH GAUSSIAN DISTRIBUTIONS , 2001 .

[10]  Richard M. Schwartz,et al.  A compact model for speaker-adaptive training , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[11]  Larry Gillick,et al.  Studies in transformation-based adaptation , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Mark J. F. Gales,et al.  Broadcast news transcription using HTK , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Richard M. Schwartz,et al.  The 1996 BBN BYBLOS HUB-4 Transcription System , 1996 .

[14]  Philip C. Woodland,et al.  Speaker adaptation of continuous density HMMs using multivariate linear regression , 1994, ICSLP.

[15]  S. Wegmann,et al.  DRAGON SYSTEMS ’ 1997 MANDARIN BROADCAST NEWS SYSTEM , 1997 .

[16]  Larry Gillick,et al.  Progress in recognizing conversational telephone speech , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.