论文信息 - Towards flexible speech coding for speech synthesis: an LF + modulated noise vocoder

Towards flexible speech coding for speech synthesis: an LF + modulated noise vocoder

This paper presents an ARX-LF-based model of speech that is amenable to low-bit-rate quantization and speech modifications directly at the parametric domain. The new model successfully addresses the non-deterministic part of voiced speech by modulating noise with the glottal flow, while unvoiced speech and transients are synthesized by modulating noise with a signal-derived time envelope. The presented work is essentially a high-quality vocoder that can be used for low complexity coding/synthesis/modification of speech suitable for embedded text-to-speech applications.

Olivier Rosec | Yannis Agiomyrgiannakis

[1] Slava Shechtman,et al. Small footprint concatenative text-to-speech synthesis system using complex spectral envelope modeling , 2005, INTERSPEECH.

[2] D. Mehta,et al. Synthesis, analysis, and pitch modification of the breathy vowel , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[3] Zhiwei Shuang,et al. High Quality Sinusoidal Modeling of Wideband Speech for the Purposes of Speech Synthesis and Modification , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[4] Jan Skoglund,et al. On time-frequency masking in voiced speech , 2000, IEEE Trans. Speech Audio Process..

[5] Eric Moulines,et al. Non-parametric techniques for pitch-scale and time-scale modification of speech , 1995, Speech Commun..

[6] Yannis Stylianou,et al. Stochastic Modeling and Quantization of Harmonic Phases in Speech using Wrapped Gaussian Mixture Models , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[7] Yannis Stylianou,et al. Harmonic plus noise models for speech, combined with statistical methods, for speech and speaker modification , 1996 .

[8] Hideki Kasuya,et al. Simultaneous Estimation of Vocal Tract and Voice Source Parameters Based on an ARX Model , 1995, IEICE Trans. Inf. Syst..

[9] Sylvain Le Beux,et al. The Speech Conductor: Gestural Control of Speech Synthesis , 2005 .

[10] Laurent Girin,et al. Perceptually weighted long term modeling of sinusoidal speech amplitude trajectories , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[11] Damien Vincent. Analyse et contrôle du signal glottique en synthèse de la parole , 2007 .

[12] Hugo Van hamme,et al. DCT-Based Amplitude and Frequency Modulated Harmonic-Plus-Noise Modelling for Text-to-Speech Synthesis , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[13] X. Rodet. EFFICIENT SPECTRAL ENVELOPE ESTIMATION AND ITS APPLICATION TO PITCH SHIFTING AND ENVELOPE PRESERVATION , 2005 .

[14] Myriam Desainte-Catherine,et al. ADAPTING THE OVERLAP-ADD METHOD TO THE SYNTHESIS OF NOISE , 2002 .

[15] Olivier Rosec,et al. A New Method for Speech Synthesis and Transformation Based on an ARX-LF Source-Filter Decomposition and HNM Modeling , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[16] J. Liljencrants,et al. Dept. for Speech, Music and Hearing Quarterly Progress and Status Report a Four-parameter Model of Glottal Flow , 2022 .