Studies on Glottal Source and Formant Trajectory Models for the Synthesis of High Quality Speech

Publisher Summary This chapter describes a polynomial voice source model and a formant trajectory model for a hi-fi speech synthesizer. The voice source model represents time derivative of the glottal volume velocity waveform as a polynomial function. The formant model describes the formant trajectories as the summation of temporal functions: a second order delay function, which represents vowel-to-vowel transitions, and two first order delay functions, which represent the effects of surrounding consonants on the vowel formant trajectories. The models were tested through perceptual experiments for synthetic speech at slow and fast speaking rates. Results suggest that the models work well particularly at slow rates. Some additional strategies seem to be needed to improve the intelligibility of consonants at fast rates.

[1]  D. Shankweiler,et al.  Prosodic information for vowel identity , 1977 .

[2]  A. Rosenberg Effect of glottal pulse shape on the quality of natural vowels. , 1969 .

[3]  B. Lindblom Spectrographic Study of Vowel Reduction , 1963 .

[4]  Hiroya Fujisaki,et al.  Formulation of the Process of Coarticulation in Terms of Formant Frequencies and Its Application to Automatic Speech Recognition , 1978 .

[5]  F. Cooper,et al.  Effect of speaking rate on labial consonant-vowel articulation , 1974 .

[6]  Donald G. Childers,et al.  Electroglottography for Laryngeal Function Assessment and Speech Analysis , 1984, IEEE Transactions on Biomedical Engineering.

[7]  T. V. Ananthapadmanabha,et al.  Calculation of true glottal flow and its components , 1982, Speech Commun..

[8]  Yorinobu Sonoda,et al.  Effect of speaking rate on articulatory dynamics and motor event , 1987 .

[9]  D. O'shaughnessy The effects of speaking rate on formant transitions in French synthesis-by-rule , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Forrest W. Young,et al.  Nonmetric individual differences multidimensional scaling: An alternating least squares method with optimal scaling features , 1977 .

[11]  T. Koizumi,et al.  Two-mass models of the vocal cords for natural sounding voice synthesis. , 1987, The Journal of the Acoustical Society of America.

[12]  J. Kruskal Nonmetric multidimensional scaling: A numerical method , 1964 .

[13]  J E Flege,et al.  Effects of speaking rate on tongue position and velocity of movement in vowel production. , 1988, The Journal of the Acoustical Society of America.

[14]  Frantz Clermont,et al.  A methodology for modeling vowel formant contours in CVC context , 1987 .

[15]  D. Broad,et al.  Formant-frequency trajectories in selected CVC-syllable nuclei. , 1970, The Journal of the Acoustical Society of America.

[16]  K. Moll,et al.  A cineradiographic study of VC and CV articulatory velocities , 1976 .

[17]  Dennis H. Klatt Acoustic correlates of breathiness: First harmonic amplitude, turbulence noise, and tracheal coupling , 1987 .