Towards flexible speech coding for speech synthesis: an LF + modulated noise vocoder

This paper presents an ARX-LF-based model of speech that is amenable to low-bit-rate quantization and speech modifications directly at the parametric domain. The new model successfully addresses the non-deterministic part of voiced speech by modulating noise with the glottal flow, while unvoiced speech and transients are synthesized by modulating noise with a signal-derived time envelope. The presented work is essentially a high-quality vocoder that can be used for low complexity coding/synthesis/modification of speech suitable for embedded text-to-speech applications.

[1]  Slava Shechtman,et al.  Small footprint concatenative text-to-speech synthesis system using complex spectral envelope modeling , 2005, INTERSPEECH.

[2]  D. Mehta,et al.  Synthesis, analysis, and pitch modification of the breathy vowel , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[3]  Zhiwei Shuang,et al.  High Quality Sinusoidal Modeling of Wideband Speech for the Purposes of Speech Synthesis and Modification , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[4]  Jan Skoglund,et al.  On time-frequency masking in voiced speech , 2000, IEEE Trans. Speech Audio Process..

[5]  Eric Moulines,et al.  Non-parametric techniques for pitch-scale and time-scale modification of speech , 1995, Speech Commun..

[6]  Yannis Stylianou,et al.  Stochastic Modeling and Quantization of Harmonic Phases in Speech using Wrapped Gaussian Mixture Models , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[7]  Yannis Stylianou,et al.  Harmonic plus noise models for speech, combined with statistical methods, for speech and speaker modification , 1996 .

[8]  Hideki Kasuya,et al.  Simultaneous Estimation of Vocal Tract and Voice Source Parameters Based on an ARX Model , 1995, IEICE Trans. Inf. Syst..

[9]  Sylvain Le Beux,et al.  The Speech Conductor: Gestural Control of Speech Synthesis , 2005 .

[10]  Laurent Girin,et al.  Perceptually weighted long term modeling of sinusoidal speech amplitude trajectories , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[11]  Damien Vincent Analyse et contrôle du signal glottique en synthèse de la parole , 2007 .

[12]  Hugo Van hamme,et al.  DCT-Based Amplitude and Frequency Modulated Harmonic-Plus-Noise Modelling for Text-to-Speech Synthesis , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[13]  X. Rodet EFFICIENT SPECTRAL ENVELOPE ESTIMATION AND ITS APPLICATION TO PITCH SHIFTING AND ENVELOPE PRESERVATION , 2005 .

[14]  Myriam Desainte-Catherine,et al.  ADAPTING THE OVERLAP-ADD METHOD TO THE SYNTHESIS OF NOISE , 2002 .

[15]  Olivier Rosec,et al.  A New Method for Speech Synthesis and Transformation Based on an ARX-LF Source-Filter Decomposition and HNM Modeling , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[16]  J. Liljencrants,et al.  Dept. for Speech, Music and Hearing Quarterly Progress and Status Report a Four-parameter Model of Glottal Flow , 2022 .