A Voice Conversion Algorithm in the Context of Sparse Training Data

A new voice speaker conversion algorithm is proposed.The algorithm evaluates the parameters of Gaussian mixture model(GMM) by Variational Bayesian(VB) theory and applies it to the track spectral parameter mapping processfor voice conversion(VC) to realize the speaker conversion.The advantage of introducing VB into VC community lies in its ability to overcoming the over-fitting problem when the training data is not sufficient.Moreover,using the probability based evaluation approach,the parameters are estimated globally instead of by point estimation.It makes VB more accurate than the traditional ones such as Maximum Likelihood(ML) or Maximum a Posterior(MAP).Subjective and objective evaluation both demonstrate that the proposed algorithm based on VB works quite well,especially when the training data is sparse.In addition,the quality and the speaker individuality of the converted speech feels much better in comparison to the traditional VC system.