Unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis using two-pass decision tree construction

This paper demonstrates how unsupervised cross-lingual adaptation of HMM-based speech synthesis models may be performed without explicit knowledge of the adaptation data language. A two-pass decision tree construction technique is deployed for this purpose. Using parallel translated datasets, cross-lingual and intralingual adaptation are compared in a controlled manner. Listener evaluations reveal that the proposed method delivers performance approaching that of unsupervised intralingual adaptation.

[1]  Heiga Zen,et al.  Robust Speaker-Adaptive HMM-Based Text-to-Speech Synthesis , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Hideki Kawahara,et al.  Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..

[3]  Simon King,et al.  Statistical analysis of the Blizzard Challenge 2007 listening test results , 2007 .

[4]  Keiichi Tokuda,et al.  Cross-Lingual Speaker Adaptation for HMM-Based Speech Synthesis , 2008, 2008 6th International Symposium on Chinese Spoken Language Processing.

[5]  Heiga Zen,et al.  Unsupervised adaptation for HMM-based speech synthesis , 2008, INTERSPEECH.

[6]  YamagishiJunichi,et al.  Thousands of voices for HMM-based speech synthesis , 2010 .

[7]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[8]  Mikko Kurimo,et al.  Importance of High-Order N-Gram Models in Morph-Based Speech Recognition , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Matthew Gibson Two-pass decision tree construction for unsupervised adaptation of HMM-based synthesis models , 2009, INTERSPEECH.

[10]  Keiichi Tokuda,et al.  Multi-Space Probability Distribution HMM , 2002 .

[11]  Keiichi Tokuda,et al.  Unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  Hui Liang,et al.  A comparison of supervised and unsupervised cross-lingual speaker adaptation approaches for HMM-based speech synthesis , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Yoshihiko Nankaku,et al.  State mapping based method for cross-lingual speaker adaptation in HMM-based speech synthesis , 2009, INTERSPEECH.

[14]  Heiga Zen,et al.  Hidden Semi-Markov Model Based Speech Synthesis System , 2006 .

[15]  Takao Kobayashi,et al.  Analysis of Speaker Adaptation Algorithms for HMM-Based Speech Synthesis and a Constrained SMAPLR Adaptation Algorithm , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Heiga Zen,et al.  Hidden semi-Markov model based speech synthesis , 2004, INTERSPEECH.

[17]  Keiichi Tokuda,et al.  Speech parameter generation algorithms for HMM-based speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).