论文信息 - HMM-based polyglot speech synthesis by speaker and language adaptive training

HMM-based polyglot speech synthesis by speaker and language adaptive training

This paper describes a technique for speaker and language adaptive training (SLAT) for HMM-based polyglot speech synthesis and its evaluations on a multi-lingual speech corpus. The SLAT technique allows multi-speaker/multi-language adaptive training and synthesis to be performed. Experimental results show that the SLAT technique achieves better naturalness than both speaker-adaptively trained language-dependent (LD-SAT) and language-independent (LI-SAT) models. In cross-lingual adaptation speaker similarity tests SLAT and LI-SAT outperform LD-SAT but there are still significant differences between polyglot adaptation and intra-language adaptation.

[1] Mark J. F. Gales,et al. Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[2] Heiga Zen,et al. The HTS-2008 System: Yet Another Evaluation of the Speaker-Adaptive HMM-based Speech Synthesis System in The 2008 Blizzard Challenge , 2008 .

[3] H. Zen,et al. An HMM-based speech synthesis system applied to English , 2002, Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002..

[4] Nick Campbell. TALKING FOREIGN - concatenative speech synthesis and the language barrier , 2001, INTERSPEECH.

[5] Junichi Yamagishi,et al. Average-Voice-Based Speech Synthesis , 2006 .

[6] Jan Odijk,et al. Introduction to multilingual corpus-based concatenative speech synthesis , 2007, INTERSPEECH.

[7] Heiga Zen,et al. Statistical Parametric Speech Synthesis , 2007, IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8] Mark J. F. Gales,et al. Adaptive training using structured transforms , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9] Richard Sproat,et al. Multilingual Text-to-Speech Synthesis: The Bell Labs Approach , 1998, CL.

[10] Yoshihiko Nankaku,et al. State mapping based method for cross-lingual speaker adaptation in HMM-based speech synthesis , 2009, INTERSPEECH.

[11] Yong Zhao,et al. Microsoft Mulan - a bilingual TTS system , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[12] Alan W. Black,et al. Multilingual text-to-speech synthesis , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13] Heiga Zen,et al. AN HMM-BASED SPEECH SYNTHESIS SYSTEM APPLIED TO ENGLISH , 2003 .

[14] Silvia Quazza,et al. ACTOR: A multilingual unit-selection speech synthesis system , 2001, SSW.

[15] Mark J. F. Gales. Cluster adaptive training of hidden Markov models , 2000, IEEE Trans. Speech Audio Process..

[16] Tanja Schultz,et al. Speaker Clustering for Multilingual Synthesis , 2006 .

[17] Keiichi Tokuda,et al. Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis , 1999, EUROSPEECH.

[18] Sadaoki Furui,et al. New approach to the polyglot speech generation by means of an HMM-based speaker adaptable synthesizer , 2006, Speech Commun..

[19] Heiga Zen. Speaker and language adaptive training for HMM-based polyglot speech synthesis , 2010, INTERSPEECH.

[20] Beat Pfister,et al. From multilingual to polyglot speech synthesis , 1999, EUROSPEECH.

[21] Richard M. Schwartz,et al. A compact model for speaker-adaptive training , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[22] Keiichi Tokuda,et al. A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis , 2007, IEICE Trans. Inf. Syst..

[23] Heiga Zen,et al. Context-dependent additive log f_0 model for HMM-based speech synthesis , 2009, INTERSPEECH.

[24] W·M·贝尔特曼,et al. Speech audio process , 2011 .

[25] Richard Sproat. Multilingual Text-to-Speech Synthesis , 1997 .

[26] Frank K. Soong,et al. A cross-language state mapping approach to bilingual (Mandarin-English) TTS , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[27] Tanja Schultz,et al. Globalphone: a multilingual speech and text database developed at karlsruhe university , 2002, INTERSPEECH.