论文信息 - Advanced LPC techniques of voice regeneration for “Virtual Dubbing”

Advanced LPC techniques of voice regeneration for “Virtual Dubbing”

Some recent voice conversion techniques consider models that make use of well-known paradigms of signal processing, such as Linear Predictive Coding and spectral modelling. We propose a voice converter based on Linear Predictive Coding, in which properly trained Gaussian Mixture Models transform the encoder coefficients, accounting for the glottal characteristics of a source voice, into new coefficients which provide the decoder with information about the glottal characteristics of a target voice. This voice conversion procedure results in a filter block diagram suitable for real time implementation, whose parameters can be accommodated depending on the performances of the DSP hardware at hand. A Simulink model of the voice converter that can be directly translated into DSP code is presented. Listening experiments are shown, reporting that both non-expert and expert subjects rated the voice converter positively.

F. Fontana | D. Gonzalez

[1] Alexander Kain,et al. Personalizing a speech synthesizer by voice adaptation , 1998, SSW.

[2] Alexander Kain,et al. Design and evaluation of a voice conversion algorithm based on spectral envelope mapping and residual prediction , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[3] Julyan H. E. Cartwright,et al. Nonlinear Dynamics of the Perceived Pitch of Complex Sounds , 1999, chao-dyn/9907002.

[4] J. Makhoul,et al. Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[5] Yoshinori Sagisaka,et al. Speech spectrum conversion based on speaker interpolation and multi-functional representation with weighting by radial basis function networks , 1995, Speech Commun..

[6] Manfred R. Schroeder,et al. Linear predictive coding of speech: Review and current directions , 1985, IEEE Communications Magazine.

[7] Alexander Kain,et al. Spectral voice conversion for text-to-speech synthesis , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[8] Yannis Stylianou,et al. Applying the harmonic plus noise model in concatenative speech synthesis , 2001, IEEE Trans. Speech Audio Process..

[9] Eric Moulines,et al. Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..

[10] Thierry Dutoit,et al. On the use of a hybrid harmonic/stochastic model for TTS synthesis-by-concatenation , 1996, Speech Commun..

[11] Carlo Drioli. Radial Basis Function Networks for Conversion of Sound Spectra , 2001, EURASIP J. Adv. Signal Process..