Syllable recognition using syllable-segment statistics and syllable-based HMM

In our previous research, we demonstrated the validity of segmental unit input hidden Markov model (HMM), which regards successive four frame MEL-cepstrum coefficients as a feature vector. The vector is reduced to lower dimensions using the KL transform. However, the model considers only the correlation between frames in a short section, but not the correlation between the frames over a long section. In this paper, in order to represent the correlation over a long distance, we use the syllable-segment statistics that are calculated by the concatenation of feature vectors, corresponding to each state in a syllable based HMM. By combining this approach with a segmentalunit input HMM, the syllable recognition rate was improved to 87% from 83% for syllables taken from continuous speech, without using a language model. We also showed the effectiveness for continuous speech recognition.