Implementation of F0 transformation for statistical singing voice conversion based on direct waveform modification

This paper presents a technique for transforming F0 in a framework of statistical singing voice conversion with direct waveform modification based on spectrum differential (DIFFSVC). The DIFFSVC method converts voice timbre of singing voices of a source singer into that of a target singer without using vocoder-based waveform generation. Although this method achieves high sound quality of the converted singing voices, its use is limited to only intra-gender conversion without the need of F0 transformation. To make it possible to also use the DIFFSVC method for cross-gender conversion, we propose a method to transform F0 of an input singing voice for the DIFFSVC. The proposed method is also based on direct waveform modification using overlap-add process and filtering process. Results of subjective evaluations demonstrate that the proposed DIFFSVC method with F0 transformation significantly improves sound quality of the converted singing voices while preserving the conversion accuracy of singer identity in the cross-gender conversion compared to the conventional SVC with vocoder.

[1]  Alexander Kain,et al.  Spectral voice conversion for text-to-speech synthesis , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[2]  Werner Verhelst,et al.  An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Keiichi Tokuda,et al.  Mel-generalized cepstral analysis - a unified approach to speech spectral estimation , 1994, ICSLP.

[4]  Tomoki Toda,et al.  Implementation of Computationally Efficient Real-Time Voice Conversion , 2012, INTERSPEECH.

[5]  Inma Hernáez,et al.  Harmonics Plus Noise Model Based Vocoder for Statistical Parametric Speech Synthesis , 2014, IEEE Journal of Selected Topics in Signal Processing.

[6]  Yannis Stylianou,et al.  Applying the harmonic plus noise model in concatenative speech synthesis , 2001, IEEE Trans. Speech Audio Process..

[7]  Tomoki Toda,et al.  Singing voice conversion method based on many-to-many eigenvoice conversion and training data generation using a singing-to-singing synthesis system , 2012, Proceedings of The 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference.

[8]  Tomoki Toda,et al.  Many-to-many eigenvoice conversion with reference voice , 2009, INTERSPEECH.

[9]  Tomoki Toda,et al.  Adaptive voice-quality control based on one-to-many eigenvoice conversion , 2010, INTERSPEECH.

[10]  Tomoki Toda,et al.  One-to-Many and Many-to-One Voice Conversion Based on Eigenvoices , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[11]  T. W. Parsons Separation of speech from interfering speech by means of harmonic selection , 1976 .

[12]  Hideki Kawahara,et al.  Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..

[13]  Banno Hideki,et al.  GMM voice conversion of singing voice using vocal tract area function , 2010 .

[14]  Tomoki Toda,et al.  Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Tomoki Toda,et al.  Statistical singing voice conversion based on direct waveform modification with global variance , 2015, INTERSPEECH.

[16]  Eric Moulines,et al.  Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..

[17]  Jordi Bonada,et al.  Applying voice conversion to concatenative singing-voice synthesis , 2010, INTERSPEECH.

[18]  Tomoki Toda,et al.  Voice Timbre Control Based on Perceived Age in Singing Voice Conversion , 2014, IEICE Trans. Inf. Syst..

[19]  Keiichi Tokuda,et al.  Speech parameter generation algorithms for HMM-based speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[20]  Tomoki Toda,et al.  Statistical singing voice conversion with direct waveform modification based on the spectrum differential , 2014, INTERSPEECH.