论文信息 - Progress in Broadcast News transcription at Dragon Systems

Progress in Broadcast News transcription at Dragon Systems

We report on progress in acoustic modelling and preprocessing in our Broadcast News transcription system. We have gone back to basics in acoustic modelling, and re-examined some of our standard practices, in particular the use of IMELDA and frequency warping, in the context of the Broadcast News corpus. We also report on some preliminary experiments with a generalization of IMELDA, "semi-tied covariances". In combination, these improvements lead to a 3.5% absolute improvement over our eval97 models. We also describe our attempts to fix our rather primitive, silence-based preprocessing system, including initial results using a new speaker-change detection algorithm based on Hotelling's T/sup 2/-test.

Puming Zhan | Larry Gillick | Steven Wegmann

[1] Thomas Hain,et al. The 1997 HTK broadcast news transcription system , 1998 .

[2] S. Wegmann,et al. Speaker normalization on conversational telephone speech , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[3] R. Gopinath. CONSTRAINED MAXIMUM LIKELIHOOD MODELING WITH GAUSSIAN DISTRIBUTIONS , 2001 .

[4] Mark J. F. Gales. Semi-tied covariance matrices , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[5] S. Chen,et al. Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion , 1998 .

[6] M. J. Hunt,et al. An investigation of PLP and IMELDA acoustic representations and of their potential for combination , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[7] Puming Zhan,et al. Dragon systems' 1998 broadcast news transcription system , 1999, EUROSPEECH.

[8] Steve Young,et al. The development of the 1996 HTK broadcast news transcription system , 1996 .

[9] Jonathan G. Fiscus,et al. A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[10] H Hermansky,et al. Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.