Voice Conversion Based on the HNT Model of Speech and Separate VQ Learning

In this paper, a text-dependent voice conversion method based on the mapping codebook approach is proposed. One of the critical tasks in voice conversion framework is speaker parameter estimation. In the given report the method based on the Harmonic-Noise-Transient (HNT) decomposition of speech is offered with an idea to separately process each of the components and further to separately convert them. Informal listening tests have shown the superiority of the presented system over the ACELP-based system.

[1]  Oytun Türk,et al.  NEW METHODS FOR VOICE CONVERSION , 2003 .

[2]  Jae S. Lim,et al.  Multiband excitation vocoder , 1988, IEEE Transactions on Acoustics, Speech, and Signal Processing.

[3]  Allen Gersho,et al.  Hybrid coding: combined harmonic and waveform coding of speech at 4 kb/s , 2001, IEEE Trans. Speech Audio Process..

[4]  Philip J. B. Jackson,et al.  Pitch-scaled estimation of simultaneous voiced and turbulence-noise components in speech , 2001, IEEE Trans. Speech Audio Process..

[5]  Eric Moulines,et al.  High-quality speech modification based on a harmonic + noise model , 1995, EUROSPEECH.

[6]  Kuldip K. Paliwal,et al.  Speech Coding and Synthesis , 1995 .

[7]  Christophe d'Alessandro,et al.  An iterative algorithm for decomposition of speech signals into periodic and aperiodic components , 1998, IEEE Trans. Speech Audio Process..

[8]  Eric Moulines,et al.  Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..

[9]  Alexander Petrovsky,et al.  SPEECH ANALYSIS – SYNTHESIS BASED ON THE PTDFT FOR VOICE CONVERSION , 2007 .

[10]  Xiao Li,et al.  A graphical model for formant tracking , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[11]  Taoufik En-Najjary,et al.  A voice conversion method based on joint pitch and spectral envelope transformation , 2004, INTERSPEECH.

[12]  Stefan Grocholewski First database for spoken polish , 1998 .

[13]  Alexander A. Petrovsky,et al.  An improved speech model with allowance for time-varying pitch harmonic amplitudes and frequencies in low bit-rate MBE coders , 1999, EUROSPEECH.

[14]  Satoshi Nakamura,et al.  Voice conversion through vector quantization , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[15]  Julius O. Smith,et al.  A Sines+Transients+Noise Audio Representation for Data Compression and Time/Pitch Scale Modifications , 1998 .

[16]  E. Shlomot,et al.  ITU-T Recommendation G.729 Annex B: a silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications , 1997, IEEE Commun. Mag..

[17]  Levent M. Arslan,et al.  Subband based voice conversion , 2002, INTERSPEECH.