High quality voice conversion by post-filtering the outputs of Gaussian processes

Voice conversion is a technique that aims to transform the individuality of source speech so as to mimic that of target speech while keeping the message unaltered, where the Gaussian mixture model based methods are most commonly used. However, these methods suffer from over-smoothing and over-fitting problems. In our previous work, we proposed to use Gaussian processes to alleviate over-fitting. Despite its effectiveness, this method will inevitably lead to over-smoothing due to choosing the mean of predictive distribution of Gaussian processes as optimal estimation. Thus, in this paper we focus on addressing the over-smoothing problem by post-filtering the outputs of the standard Gaussian processes, resulting in more dynamics in the converted feature parameters. Experiments have confirmed the validity of the proposed method both objectively and subjectively.

[1]  Jia Liu,et al.  Voice conversion with smoothed GMM and MAP adaptation , 2003, INTERSPEECH.

[2]  Moncef Gabbouj,et al.  Voice Conversion Using Dynamic Kernel Partial Least Squares Regression , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Hamid Sheikhzadeh,et al.  Voice conversion based on feature combination with limited training data , 2015, Speech Commun..

[4]  Yang Zhen A Voice Conversion Algorithm in the Context of Sparse Training Data , 2010 .

[5]  Ning Xu,et al.  Voice conversion based on Gaussian processes by coherent and asymmetric training with limited training data , 2014, Speech Commun..

[6]  Hideki Kawahara,et al.  Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..

[7]  Hui Ye,et al.  Quality-enhanced voice morphing using maximum likelihood transformations , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Eric Moulines,et al.  Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..

[9]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[10]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[11]  Tomoki Toda,et al.  Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory , 2007, IEEE Transactions on Audio, Speech, and Language Processing.