Clustering techniques have been integrated at different levels into the training procedure of a continuous-density hidden Markov model (HMM) speech recognizer. These clustering techniques can be used in two ways. First acoustically similar states are tied together. It will help to reduce the number of parameters but also allow to train otherwise rarely seen states together with more robust ones (state-tying). Secondly densities are clustered across states, this reduces the number of densities while at the same time keeping the best performances of our recognizer (density-clustering). We have applied these techniques both to word-based small-vocabulary and phoneme-based large-vocabulary recognition tasks. On the WSJ task, we could achieve a reduction of the word error rate by 7%. On the TI/NIST-connected digit task, the number of parameters was reduced by a factor 2-3 while keeping the same string error rate.
[1]
Mei-Yuh Hwang,et al.
Shared-distribution hidden Markov models for speech recognition
,
1993,
IEEE Trans. Speech Audio Process..
[2]
Dieter Geller,et al.
Improvements in connected digit recognition using linear discriminant analysis and mixture densities
,
1993,
1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[3]
Steve J. Young,et al.
The use of state tying in continuous speech recognition
,
1993,
EUROSPEECH.
[4]
Hermann Ney,et al.
Large vocabulary continuous speech recognition of Wall Street Journal data
,
1994,
Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.