论文信息 - Voice transformation using PSOLA technique

Voice transformation using PSOLA technique

Abstract In this contribution, a new system for voice conversion is described. The proposed architecture combines a PSOLA (Pitch Synchronous Overlap and Add)-derived synthesizer and a module for spectral transformation. The synthesizer based on the classical source-filter decomposition allows prosodic and spectral transformations to be performed independently. Prosodic modifications are applied on the excitation signal using the TD-PSOLA scheme; converted speech is then synthesized using the transformed spectral parameters. Two different approaches to derive spectral transformations, borrowed from the speech-recognition domain, are compared: Linear Multivariate Regression (LMR) and Dynamic Frequency Warping (DFW). Vector-quantization is carried out as a preliminary stage to render the spectral transformations dependent of the acoustical realization of sounds. A formal listening test shows that the synthesizer produces a satisfyingly natural “transformed” voice. LMR proves yet to allow a slightly better conversion than DFW. Still there is room for improvement in the spectral transformation stage.

Eric Moulines | Jean-Pierre Tubach | Hélène Valbret

[1] Eric Moulines,et al. Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[2] Kiyohiro Shikano,et al. Speaker adaptation through vector quantization , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3] H. Wakita. Normalization of vowels by vocal-tract length and its application to vowel identification , 1977 .

[5] F. Charpentier. Traitement de la parole par analyse-synthese de fourier : application a la synthese par diphones , 1988 .

[6] J. Vaissière. On French prosody , 1974 .

[7] Hiroshi Matsumoto,et al. Vowel normalization by frequency warped spectral matching , 1986, Speech Commun..

[8] J. Makhoul,et al. Discrete all-pole modeling for voiced speech , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9] Xavier Rodet,et al. Generalized functional approximation for source-filter system modeling , 1991, EUROSPEECH.

[10] M. Abe. A segment-based approach to voice conversion , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[11] Rolf Carlson,et al. Synthesis: Modeling variability and constraints , 1991, Speech Commun..

[12] Xavier Rodet,et al. An Improved Cepstral Method for Deconvolution of Source-Filter Systems with Discrete Spectra: Application to Musical Sound Signals , 1990, ICMC.

[13] Robert M. Gray,et al. An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[14] Werner Verhelst,et al. Intra-speaker transplantation of speech characteristics an application of waveform vocoding techniques and DTW , 1991, EUROSPEECH.

[15] Michael Savic,et al. Voice personality transformation , 1991, Digit. Signal Process..

[16] D. O'Shaughnessy,et al. Speaker recognition , 1986, IEEE ASSP Magazine.

[17] Satoshi Nakamura,et al. Voice conversion through vector quantization , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[18] John E. Markel,et al. Linear Prediction of Speech , 1976, Communication and Cybernetics.