Statistical Approach for Voice Personality Transformation

A voice transformation method which changes the source speaker's utterances so as to sound similar to those of a target speaker is described. Speaker individuality transformation is achieved by altering the LPC cepstrum, average pitch period and average speaking rate. The main objective of the work involves building a nonlinear relationship between the parameters for the acoustical features of two speakers, based on a probabilistic model. The conversion rules involve the probabilistic classification and a cross correlation probability between the acoustic features of the two speakers. The parameters of the conversion rules are estimated by estimating the maximum likelihood of the training data. To obtain transformed speech signals which are perceptually closer to the target speaker's voice, prosody modification is also involved. Prosody modification is achieved by scaling excitation spectrum and time scale modification with appropriate modification factors. An evaluation by objective tests and informal listening tests clearly indicated the effectiveness of the proposed transformation method. We also confirmed that the proposed method leads to smoothly evolving spectral contours over time, which, from a perceptual standpoint, produced results that were superior to conventional vector quantization (VQ)-based methods

[1]  Bayya Yegnanarayana,et al.  Transformation of formants for voice conversion using artificial neural networks , 1995, Speech Commun..

[2]  Harry L. Van Trees,et al.  Detection, Estimation, and Modulation Theory, Part I , 1968 .

[3]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[4]  Kevin Barraclough,et al.  I and i , 2001, BMJ : British Medical Journal.

[5]  A. Wilgus,et al.  High quality time-scale modification for speech , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Xerox Corpora,et al.  Speech Recognition Experiments with Linear Predication, Bandpass Filtering, and Dynamic Programming , 1975 .

[7]  Eric Moulines,et al.  Voice transformation using PSOLA technique , 1991, Speech Commun..

[8]  B. Yegnanarayana,et al.  Voice conversion: Factors responsible for quality , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  M. Melamed Detection , 2021, SETI: Astronomy as a Contact Sport.

[10]  M.G. Bellanger,et al.  Digital processing of speech signals , 1980, Proceedings of the IEEE.

[11]  M. Richards Helium speech enhancement using the short-time Fourier transform , 1982 .

[12]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[13]  이기승,et al.  낮은 차원의 벡터 변환을 통한 음성 변환 ( Voice Conversion Using Low Dimensional Vector Mapping ) , 1998 .

[14]  Eric Moulines,et al.  Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[15]  Eric Moulines,et al.  Non-parametric techniques for pitch-scale and time-scale modification of speech , 1995, Speech Commun..

[16]  Michael Savic,et al.  Voice personality transformation , 1991, Digit. Signal Process..

[17]  Levent M. Arslan,et al.  Speaker Transformation Algorithm using Segmental Codebooks (STASC) , 1999, Speech Commun..

[18]  W. Bastiaan Kleijn,et al.  Spectral dynamics is more important than spectral distortion , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[19]  G. White,et al.  Speech recognition experiments with linear predication, bandpass filtering, and dynamic programming , 1976 .

[20]  Thilo Pfau,et al.  Estimating the speaking rate by vowel detection , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[21]  Masanobu Abe,et al.  Voice conversion algorithm based on piecewise linear conversion rules of formant frequency and spectrum tilt , 1995, Speech Commun..

[22]  Yoshinori Sagisaka,et al.  Speech spectrum conversion based on speaker interpolation and multi-functional representation with weighting by radial basis function networks , 1995, Speech Commun..

[23]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[24]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[25]  Stephen Cox,et al.  Unsupervised speaker adaptation by probabilistic spectrum fitting , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[26]  Satoshi Nakamura,et al.  Voice conversion through vector quantization , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[27]  Ning Bi,et al.  Application of speech conversion to alaryngeal speech enhancement , 1997, IEEE Trans. Speech Audio Process..

[28]  Eric Moulines,et al.  Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..