Comparison results for segmental training algorithms for mixture density HMMs

This work presents experiments on four segmental training algorithms for mixture density HMMs. The segmental versions of SOM and LVQ3 suggested by the author are compared against the conventional segmental K-means and the segmental GPD. The recognition task used as a test bench is the speaker dependent, but vocabulary independent automatic speech recognition. The output density function of each state in each model is a mixture of multivariate Gaussian densities. Neural network methods SOM and LVQ are applied to learn the parameters of the density models from the mel-cepstrum features of the training samples. The segmental training improves the segmentation and the model parameters by turns to obtain the best possible result, because the segmentation and the segment classification depend on each other. It suffices to start the training process by dividing the training samples approximatively into phoneme samples.

[1]  Teuvo Kohonen,et al.  Things you haven't heard about the self-organizing map , 1993, IEEE International Conference on Neural Networks.

[2]  Mikko Kurimo Segmental LVQ3 training for phoneme-wise tied mixture density HMMS , 1996, 1996 8th European Signal Processing Conference (EUSIPCO 1996).

[3]  Mikko Kurimo,et al.  Using the self-organizing map to speed up the probability density estimation for speech recognition with mixture density HMMs , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[4]  Chin-Hui Lee,et al.  Segmental GPD training of HMM based speech recognizer , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[6]  Xuedong Huang,et al.  Semi-continuous hidden Markov models for speech signals , 1990 .

[7]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[8]  Mikko Kurimo,et al.  Training mixture density HMMs with SOM and LVQ , 1997, Comput. Speech Lang..

[9]  Mikko Kurimo,et al.  Hybrid training method for tied mixture density hidden Markov models using learning vector quantization and Viterbi estimation , 1994, Proceedings of IEEE Workshop on Neural Networks for Signal Processing.

[10]  Lawrence R. Rabiner,et al.  A segmental k-means training procedure for connected word recognition , 1986, AT&T Technical Journal.