We present a new view of hidden Markov model (HMM) state tying, showing that the accuracy of phonetically tied mixture (PTM) models is similar to, or better than, that of the more typical stateclustered HMM systems. The PTM models require fewer Gaussian distance computations during recognition, and can lead to recognition speedups. We describe a per-phone Gaussian clustering algorithm that automatically determines the number of Gaussians for each phone in the PTM model. Experimental results show that this method gives a substantial decrease in the number of Gaussians and a corresponding speedup with little degradation inaccuracy. Finally, we study mixture weight thresholding algorithms to drastically decrease the number of mixture weights in the PTM model without degrading accuracy. More than a factor of 10 reduction in mixture weights is achieved with no degradation in performance.
[1]
Ananth Sankar.
A new look at HMM parameter tying for large vocabulary speech recognition
,
1998,
ICSLP.
[2]
Frank K. Soong,et al.
Quantizing mixture-weights in a tied-mixture HMM
,
1996,
Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.
[3]
Vassilios Digalakis,et al.
Genones: generalized mixture tying in continuous hidden Markov model-based speech recognizers
,
1996,
IEEE Trans. Speech Audio Process..
[4]
Ramana Rao,et al.
SRI’s 1998 Broadcast News System – Toward Faster, Better, Smaller Speech Recognition
,
1999
.