Cross-lingual voice conversion-based polyglot speech synthesizer for indian languages

A polyglot speech synthesizer, synthesizes speech for any given monolingual or multilingual text, in a single speaker’s voice. In this regard, a polyglot speech corpus is required. It is difficult to find a speaker proficient in multiple languages. Therefore, in the current work, by exploiting the acoustic similarity of phonemes across Indian languages, a polyglot speech corpus is obtained for four Indian languages and Indian English, using GMM-based cross-lingual voice conversion. The optimum target speaker and GMM topology is chosen based on the performance of a speaker identification system. It is observed that, the language that shares the most number of phonemes with the other languages, serves as the best target. A polyglot speech corpus derived in this target speaker’s voice, is further used to develop an HMM-based polyglot speech synthesizer. The performance of this synthesizer is evaluated in terms of speaker identity using ABX listening test, quality using mean opinion score (MOS) and speaker switching using subjective listening test.

[1]  Hema A. Murthy,et al.  A common attribute based unified HTS framework for speech synthesis in Indian languages , 2013, SSW.

[2]  Frank K. Soong,et al.  A Cross-Language State Sharing and Mapping Approach to Bilingual (Mandarin–English) TTS , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  T. Nagarajan,et al.  Analysis on acoustic similarities between Tamil and English phonemes using product of likelihood-Gaussians for an HMM-based mixed-language synthesizer , 2013, 2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE).

[4]  Tomoki Toda,et al.  Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Sadaoki Furui,et al.  New approach to the polyglot speech generation by means of an HMM-based speaker adaptable synthesizer , 2006, Speech Commun..

[6]  T. Nagarajan,et al.  Voice conversion-based multilingual to polyglot speech synthesizer for Indian languages , 2013, 2013 IEEE International Conference of IEEE Region 10 (TENCON 2013).

[7]  Tomoki Toda,et al.  Evaluation of cross-language voice conversion based on GMM and straight , 2001, INTERSPEECH.

[8]  Beat Pfister,et al.  From multilingual to polyglot speech synthesis , 1999, EUROSPEECH.

[9]  Eric Moulines,et al.  Statistical methods for voice quality transformation , 1995, EUROSPEECH.

[10]  Claudia Barolo,et al.  Language independent phoneme mapping for foreign TTS , 2004, SSW.

[11]  益子 貴史,et al.  HMM-based speech synthesis and its applications , 2003 .

[12]  Eric Moulines,et al.  Voice transformation using PSOLA technique , 1991, Speech Commun..

[13]  A. F. Machado,et al.  VOICE CONVERSION: A CRITICAL SURVEY , 2010 .

[14]  Heiga Zen,et al.  Statistical Parametric Speech Synthesis , 2007, IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  Mike Brookes,et al.  Estimation of Glottal Closure Instants in Voiced Speech Using the DYPSA Algorithm , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  P. Vijayalakshmi,et al.  Performance evaluation and comparison of multilingual speech synthesizers for Indian languages , 2013, 2013 International Conference on Recent Trends in Information Technology (ICRTIT).