New refinement schemes for voice conversion

New refinement schemes for voice conversion are proposed in this paper. We take mel-frequency cepstral coefficients (MFCC) as the basic feature and adopt cepstral mean subtraction to compensate the channel effects. We propose S/U/V (silence/unvoiced/voiced) decision rule such that two sets of codebooks are used to capture the difference between unvoiced and voiced segments of the source speaker. Moreover, we apply three schemes to refine the synthesized voice, including pitch refinement with PSOLA, energy equalization, and frame concatenation based on synchronized pitch marks. The satisfactory performance of the voice conversion system can be demonstrated through ABX listening test and MOS grade.

[1]  L.C. Schwardt,et al.  Voice conversion based on static speaker characteristics , 1998, Proceedings of the 1998 South African Symposium on Communications and Signal Processing-COMSIG '98 (Cat. No. 98EX214).

[2]  K. Pavan Kumar SPEECH SYNTHESIS BASED ON SINUSOIDAL MODELING , 2004 .

[3]  Jyh-Shing Roger Jang,et al.  An On-the-Fly Mandarin Singing Voice Synthesis System , 2002, IEEE Pacific Rim Conference on Multimedia.

[4]  K. Shikano,et al.  Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping of STRAIGHT spectrum , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[5]  Alexander Kain,et al.  Spectral voice conversion for text-to-speech synthesis , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[6]  Werner Verhelst,et al.  Voice conversion using partitions of spectral feature space , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[7]  Satoshi Nakamura,et al.  Voice conversion through vector quantization , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[8]  Alex Acero,et al.  Spoken Language Processing , 2001 .

[9]  Leah H. Jamieson,et al.  A probabilistic approach to AMDF pitch detection , 1994, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[10]  METHODS FOR SUBJECTIVE DETERMINATION OF TRANSMISSION QUALITY Summary , 2022 .

[11]  Eric Moulines,et al.  Voice transformation using PSOLA technique , 1991, Speech Commun..

[12]  Eric Moulines,et al.  Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..