论文信息 - High-Individuality Voice Conversion Based on Concatenative Speech Synthesis

High-Individuality Voice Conversion Based on Concatenative Speech Synthesis

Concatenative speech synthesis is a method that can make speech sound which has naturalness and high-individuality of a speaker by introducing a large speech corpus. Based on this method, in this paper, we propose a voice conversion method whose conversion speech has high-individuality and naturalness. The authors also have two subjective evaluation experiments for evaluating individuality and sound quality of conversion speech. From the results, following three facts have be confirmed: (a) the proposal method can convert the individuality of speakers well, (b) employing the framework of unit selection (especially join cost) of concatenative speech synthesis into conventional voice conversion improves the sound quality of conversion speech, and (c) the proposal method is robust against the difference of genders between a source speaker and a target speaker. Keywords—concatenative speech synthesis, join cost, speaker individuality, unit selection, voice conversion

Kei Fujii | Kaori Suigetsu | Jun Okawa

[1] K. Shikano,et al. Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping of STRAIGHT spectrum , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[2] M. Abe. A segment-based approach to voice conversion , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[3] Hideki Kashioka,et al. A TRIAL TO APPLY CONCATENATIVE SPEECH SYNTHESIS TO SPONTANEOUS SPEECH , 2006 .

[4] Alan W. Black,et al. Prosody and the Selection of Source Units for Concatenative Synthesis , 1997 .

[5] Hermann Ney,et al. Text-Independent Voice Conversion Based on Unit Selection , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[6] Keiichi Tokuda,et al. XIMERA: a new TTS from ATR based on corpus-based technologies , 2004, SSW.

[7] Alexander Kain,et al. Spectral voice conversion for text-to-speech synthesis , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[8] Mark Huckvale,et al. Improvements in Speech Synthesis , 2001 .

[9] Eric Moulines,et al. Statistical methods for voice quality transformation , 1995, EUROSPEECH.