Extended Decision Tree with or Relationship for HMM-Based Speech Synthesis

This paper proposes a variant of decision tree (DT) for HMM-based speech synthesis. We call it Extended Decision Tree with OR Relationship (EDTOR). A leaf node in conventional DT is uniquely reached by answering a series of yes/no questions starting from its root node until the leaf node. Thus the decision condition for deciding whether the acoustic parameters of a context label belong to a certain leaf node is subject to AND logical expressions. However, some linguistic knowledge cannot be represented by AND logical expressions compactly and efficiently. We introduce OR relationship to DT at leaf node level to loosen the restriction on DT. Preliminary experimental results show that EDTOR can, 1) greatly reduce the leaf node number of DT (i.e., model size) without affecting speech synthesis performance, which is appealing to embedded applications, or, 2) slightly improve the performance if DT has the same leaf node number as that of EDTOR.

[1]  Keiichi Tokuda,et al.  Multi-Space Probability Distribution HMM , 2002 .

[2]  Koichi Shinoda,et al.  MDL-based context-dependent subword modeling for speech recognition , 2000 .

[3]  Heiga Zen,et al.  The HMM-based speech synthesis system (HTS) version 2.0 , 2007, SSW.

[4]  Yoshua Bengio,et al.  DECISION TREES DO NOT GENERALIZE TO NEW VARIATIONS , 2010, Comput. Intell..

[5]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[6]  Jj Odell,et al.  The Use of Context in Large Vocabulary Speech Recognition , 1995 .

[7]  Zhi-Jie Yan,et al.  Cross-validation based decision tree clustering for HMM-based TTS , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Ren-Hua Wang,et al.  Minimum Generation Error Training for HMM-Based Speech Synthesis , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[9]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .

[10]  Heiga Zen,et al.  Statistical Parametric Speech Synthesis , 2007, IEEE International Conference on Acoustics, Speech, and Signal Processing.