Maximum-likelihood updates of HMM duration parameters for discriminative continuous speech recognition

Previous studies showed that a signi cantly enhanced recognition performance can be achieved by incorporating information about HMM duration along with the cepstral parameters. The reestimation formula for the duration parameters have been derived in the past using xed segmentation during K-means training and the duration statistics are always xed throughout the additional minimum string error (MSE) training process. In this study, we update the duration parameters along with other model parameters during discriminative training iterations. The convergence property of the training property based on the MSE approach is investigated, and experimental results on wireline connected digit recognition task demonstrated a 6% word error rate reduction by using the newly trained duration model parameters as compared to xed duartion parameters during MSE training.

[1]  David L. Thomson,et al.  Use of periodicity and jitter as speech recognition features , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[2]  Biing-Hwang Juang,et al.  The segmental K-means algorithm for estimating parameters of hidden Markov models , 1990, IEEE Trans. Acoust. Speech Signal Process..

[3]  S.K. Gupta,et al.  High-accuracy connected digit recognition for mobile applications , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[4]  Chafic Mokbel,et al.  Deconvolution of telephone line effects for speech recognition , 1996, Speech Commun..

[5]  A. Cook,et al.  Experimental evaluation of duration modelling techniques for automatic speech recognition , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Biing-Hwang Juang,et al.  Recent developments in the application of hidden Markov models to speaker-independent isolated word recognition , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Stephen E. Levinson,et al.  Continuously variable duration hidden Markov models for automatic speech recognition , 1986 .

[8]  Richard M. Schwartz,et al.  Duration modeling in large vocabulary speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[9]  Aaron E. Rosenberg,et al.  Cepstral channel normalization techniques for HMM-based speaker verification , 1994, ICSLP.

[10]  Leah H. Jamieson,et al.  Modeling duration in a hidden Markov model with the exponential family , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  David Burshtein Robust parametric modeling of durations in hidden Markov models , 1996, IEEE Trans. Speech Audio Process..

[12]  Jean-François Mari,et al.  Time derivatives, cepstrai normaiization, and spectral parameter filtering for continuously spelled names over the telephone , 1995, EUROSPEECH.

[13]  Biing-Hwang Juang,et al.  Minimum classification error rate methods for speech recognition , 1997, IEEE Trans. Speech Audio Process..