Automatic generation of non-uniform context-dependent HMM topologies based on the MDL criterion

We propose a new method of automatically creating nonuniform context-dependent HMM topologies by using the Minimum Description Length (MDL) criterion. Phonetic decision tree clustering is widely used, based on the Maximum Likelihood (ML) criterion, and creates only contextual variations. However, it also needs to empirically predetermine control parameters for use as stop criteria, for example, the total number of states. Furthermore, it cannot create topologies with various state lengths automatically. Therefore, we introduce the MDL criterion as split and stop criteria, and use the Successive State Splitting (SSS) algorithm as a method of generating contextual and temporal variations. This proposed method, the MDL-SSS, can automatically create proper topologies without such predetermined parameters. Experimental results show that the MDLSSS can automatically stop splitting and obtain more appropriate HMM topologies than the original one. Furthermore, we investigated the MDL-SSS combined with phonetic decision tree clustering, and this method can automatically obtain the best performance with any heuristic.

[1]  Atsushi Nakamura,et al.  Unified framework for acoustic topology modelling: ML-SSS and question-based decision trees , 1999, EUROSPEECH.

[2]  Philip A. Chou,et al.  Optimal Partitioning for Classification and Regression Trees , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Mari Ostendorf,et al.  HMM topology design using maximum likelihood successive state splitting , 1997, Comput. Speech Lang..

[4]  Steve J. Young,et al.  Tree-Based State Tying for High Accuracy Modelling , 1994, HLT.

[5]  Koichi Shinoda,et al.  MDL-based context-dependent subword modeling for speech recognition , 2000 .

[6]  Jorma Rissanen,et al.  Universal coding, information, prediction, and estimation , 1984, IEEE Trans. Inf. Theory.

[7]  Hitoshi Isahara,et al.  Toward the realization of spontaneous speech recognition - introduction of a Japanese priority program and preliminary results - , 2000, INTERSPEECH.

[8]  Yoshinori Sagisaka,et al.  Multi-class composite N-gram based on connection direction , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[9]  Shigeki Sagayama,et al.  A successive state splitting algorithm for efficient allophone modeling , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.