论文信息 - Decision tree-based simultaneous clustering of phonetic contexts, dimensions, and state positions for acoustic modeling

Decision tree-based simultaneous clustering of phonetic contexts, dimensions, and state positions for acoustic modeling

In recent years, context-dependent hidden Markov model, typically triphones and continuous density HMMs are often used. The use of triphones results in too many free-parameters in a system, hence it is difficult to estimate the model which is statistically reliable. Therefore, various parameter clustering techniques have been proposed. The use of Phonetic Decision Trees (P-DT) based state-tying technique is a good solution to this problem. However, state-tying technique cannot construct proper context-dependent sharing structure and cannot assign proper number of free-parameter for each dimension. In this paper, Phonetic and Dimensional Decision Trees (PD-DT) is proposed by introducing the MDL-based dimensional-split technique into P-DT. Furthermore, by incorpolating questions about state positions into PD-DT, Phonetic, Dimensional and State positional Decision Trees (PDS-DT) is defined. In speaker-independent continuous speech recognition experiments, proposed technique achieved about 13%–15% error reduction over P-DT based state-tying technique.

Heiga Zen | Keiichi Tokuda | Tadashi Kitamura

[1] Sadaoki Furui,et al. Speaker-independent isolated word recognition using dynamic features of speech spectrum , 1986, IEEE Trans. Acoust. Speech Signal Process..

[2] Wu Chou,et al. A unified approach of incorporating general features in decision tree based acoustic modeling , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[3] Mari Ostendorf,et al. HMM topology design using maximum likelihood successive state splitting , 1997, Comput. Speech Lang..

[4] S. J. Young,et al. Tree-based state tying for high accuracy acoustic modelling , 1994 .

[5] Koichi Shinoda,et al. MDL-based context-dependent subword modeling for speech recognition , 2000 .

[6] Mari Ostendorf,et al. Use of higher level linguistic structure in acoustic modeling for speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[7] Shigeki Sagayama,et al. A successive state splitting algorithm for efficient allophone modeling , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8] B. Juang,et al. Context-dependent Phonetic Hidden Markov Models for Speaker-independent Continuous Speech Recognition , 2008 .

[9] Kai-Fu Lee,et al. Context-independent phonetic hidden Markov models for speaker-independent continuous speech recognition , 1990 .

[10] Jorma Rissanen,et al. Universal coding, information, prediction, and estimation , 1984, IEEE Trans. Inf. Theory.

[11] Mei-Yuh Hwang,et al. Predicting unseen triphones with senones , 1996, IEEE Trans. Speech Audio Process..

[12] Heiga Zen,et al. Speech recognition using voice-characteristic-dependent acoustic models , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[13] Heiga Zen,et al. Decision tree distribution tying based on a dimensional split technique , 2002, INTERSPEECH.

[14] Keiichi Tokuda,et al. An adaptive algorithm for mel-cepstral analysis of speech , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15] Steve Young,et al. Benchmark DARPA RM results using the HTK portable HMM toolkit , 1992 .

[16] Roland Kuhn,et al. Improving decision trees for acoustic modeling , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[17] Wu Chou,et al. Decision tree state tying based on penalized Bayesian information criterion , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[18] Shigeki Matsuda,et al. Asynchronous-transition HMM , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[19] Shigeki Matsuda,et al. Feature-dependent allophone clustering , 2000, INTERSPEECH.

[20] Shigeki Sagayama,et al. Asynchronous-Transition HMM for Acoustic Modeling , 2000 .