Cluster-dependent acoustic modeling [speech recognition applications]

In this paper, we present cluster-dependent acoustic modeling for large-vocabulary speech recognition. With large amounts of acoustic training data, we build multiple cluster-dependent models (CDM), each focusing on a group of speakers in order to represent speaker-dependent characteristics. It is motivated by the fact that a sufficiently trained speaker-dependent (SD) model is better than the speaker-independent (SI) model. During decoding, we decode the data of each test speaker using CDMs selected under certain criteria to achieve high recognition accuracy. Various speaker clustering and model selection techniques are proposed and compared in the task of broadcast news (BN) transcription. The CDM provided more than 1% absolute gain in unadapted decoding and 0.5% gain in adapted decoding when compared to our baseline system on the EARS BN 2003 development test set.

[1]  Andreas G. Andreou,et al.  Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition , 1998, Speech Commun..

[2]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[3]  Richard M. Schwartz,et al.  Progress in transcription of Broadcast News using Byblos , 2002, Speech Commun..

[4]  Daben Liu,et al.  Online speaker clustering , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[5]  S. Matsoukas,et al.  Improved speaker adaptation using speaker dependent feature projections , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[6]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[7]  Bing Xiang,et al.  Light supervision in acoustic model training , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[9]  Richard M. Schwartz,et al.  A compact model for speaker-adaptive training , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[10]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[11]  Mark J. F. Gales,et al.  Discriminative map for acoustic model adaptation , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[12]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..