New Approach to Polyglot Synthesis: How to Speak any Language with Anyone's Voice

In this paper we present a new method to synthesize multiple languages with the voice of any arbitrary speaker. We call this method “HMM-based speaker-adaptable polyglot synthesis”. The idea consists in mixing data from several speakers in different languages to create a speakerindependent multilingual acoustic model. By means of MLLR, we can adapt this model to the voice of any given speaker. With the adapted model, it is possible to synthesize speech in any of the languages included in the training corpus with the voice of the target speaker, regardless of the language spoken by that speaker. When the language to be synthesized and the language of the target speaker are different, the performance of our method is better than that of other approaches based on monolingual models and phone mapping. Languages with no available speech resources can also be synthesized with a polyglot synthesizer by means of phone mapping. In this case, the performance of a polyglot synthesizer is better than that of any other monolingual synthesizers based on languages which were used to train the polyglot one.