论文信息 - Codec integrated voice conversion for embedded speech synthesis

Codec integrated voice conversion for embedded speech synthesis

Voice conversion technologies transform individual characteristics of speech patterns while preserving the original content, and can be widely used in speech processing. Considering limited system resources, in particular, of embedded concatenative speech synthesis, voice conversion may reduce the memory consumption of the acoustic database. Voice conversion enables the intra-gender or cross-gender generation of new voices by using an existing high-quality voice. Usually, voice conversion is based on modification of spectral properties in accord with pitch manipulation. Warping functions in the frequency domain aiming at a reverse vocal tract length normalization (VTLN) is a simplified approach. Consequently, voice conversion itself generates a critical calculation complexity which contradicts the practical constraints of typical embedded and mobile applications. The authors propose a novel approach for voice conversion by re-using features of a common speech codec. Such a codec is already available in typical mobile applications and the resulting voice quality is widely accepted. The paper investigates the manipulation of the immittance spectral frequencies (ISF) provided by the Adaptive Multi Rate Wideband codec (AMR-WB). This algorithm has been integrated into the embedded speech synthesizer microDRESS.

Rüdiger Hoffmann | Guntram Strecha | Oliver Jokisch | Matthias Eichner

[1] Roch Lefebvre,et al. The adaptive multirate wideband speech codec (AMR-WB) , 2002, IEEE Trans. Speech Audio Process..

[2] Rüdiger Hoffmann,et al. A multilingual TTS system with less than 1 Mbyte footprint for embedded applications , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[3] Herbert Gish,et al. A parametric approach to vocal tract length normalization , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[4] Ahmet M. Kondoz,et al. Digital Speech: Coding for Low Bit Rate Communication Systems , 1995 .

[5] Alexander Kain,et al. Spectral voice conversion for text-to-speech synthesis , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[6] Rüdiger Hoffmann,et al. Voice characteristics conversion for TTS using reverse VTLN , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7] Li Lee,et al. A frequency warping approach to speaker normalization , 1998, IEEE Trans. Speech Audio Process..