论文信息 - Multi-level clustering of acoustic features for phoneme recognition based on mutual information

Multi-level clustering of acoustic features for phoneme recognition based on mutual information

An optimal method for organizing acoustic features to recognize phonemes in continuous speech is described. Each level of acoustic features, including power and its variational pattern, and the linear predictive coding Mel-cepstrum and its pattern of temporal change, is clustered hierarchically on the basis of the mutual information between the acoustic feature vector and phoneme labels assigned for the speech wave. Multilevel clustering is used to discriminate phonemes by detecting the most reliable features in the context and by using the effective combination of acoustic characteristics. Phoneme recognition for each frame is discussed. The conditional entropy is evaluated for the phoneme labels of the frame, given the various acoustic features for the neighboring frames. Phoneme discrimination can be performed effectively using the conditional entropy. In the preliminary test the phoneme recognition rate was 81.6%, and the vowel recognition rate was 92.4% in the frame level. In a completely talker-independent experiment the recognition rates were 76.8% and 89.7%, respectively.<<ETX>>

Katsuhiko Shirai | Noriyuki Aoki | Naoki Hosaka

[1] Katsuhiko Shirai,et al. Phoneme recognition in connected speech using both static and dynamic properties of spectrum described by vector quantization , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2] Alex Waibel,et al. Phoneme recognition: neural networks vs. hidden Markov models vs. hidden Markov models , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[3] Hsiao-Wuen Hon,et al. Large-vocabulary speaker-independent continuous speech recognition using HMM , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[4] Robert M. Gray,et al. An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[5] John Makhoul,et al. BYBLOS: The BBN continuous speech recognition system , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.