论文信息 - Voice conversion based on style and content separation with dual latent variable model

Voice conversion based on style and content separation with dual latent variable model

This paper presents a novel method for voice conversion based on style and content separation, which is solved by using dual latent variable model (D-LVM). Based on D-LVM, the vocal tract spectrum of speech represented by line spectral frequencies (LSF) is explicitly decomposed into so-called style and content factors, which are used to represent the speech meaning and the speaker individuality respectively. On the basis of reasonable separation of style and content for speech, voice conversion is performed successfully by reproducing converted speech using the initial speech content and the target speaker style. The objective and subjective tests show that, under the condition of limited training dataset, the method proposed in the paper gets better conversion performance compared to the conventional mapping based GMM system and SVD based bilinear model.

[1] Eric Moulines,et al. Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..

[2] Michael E. Tipping,et al. Probabilistic Principal Component Analysis , 1999 .

[3] David J. Fleet,et al. Multifactor Gaussian process models for style-content separation , 2007, ICML '07.

[4] David J. Fleet,et al. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Gaussian Process Dynamical Model , 2007 .

[5] Neil D. Lawrence,et al. Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models , 2005, J. Mach. Learn. Res..

[6] Moncef Gabbouj,et al. A novel technique for voice conversion based on style and content decomposition with bilinear models , 2009, INTERSPEECH.