Measuring and modeling vocal source-tract interaction

The quality of synthetic speech is affected by two factors: intelligibility and naturalness. At present, synthesized speech may be highly intelligible, but often sounds unnatural. Speech intelligibility depends on the synthesizer's ability to reproduce the formants, the formant bandwidths, and formant transitions, whereas speech naturalness is thought to depend on the excitation waveform characteristics for voiced and unvoiced sounds. Voiced sounds may be generated by a quasiperiodic train of glottal pulses of specified shape exciting the vocal tract filter. It is generally assumed that the glottal source and the vocal tract filter are linearly separable and do not interact. However, this assumption is often not valid, since it has been observed that appreciable source-tract interaction can occur in natural speech. Previous experiments in speech synthesis have demonstrated that the naturalness of synthetic speech does improve when source-tract interaction is simulated in the synthesis process. The purpose of this paper is two-fold: (1) to present an algorithm for automatically measuring source-tract interaction for voiced speech, and (2) to present a simple speech production model that incorporates source-tract interaction into the glottal source model, This glottal source model controls: (1) the skewness of the glottal pulse, and (2) the amount of the first formant ripple superimposed on the glottal pulse. A major application of the results of this paper is the modeling of vocal disorders.<<ETX>>

[1]  D. Childers,et al.  Two-channel speech analysis , 1986, IEEE Trans. Acoust. Speech Signal Process..

[2]  Qiguang Lin,et al.  Glottal source‐vocal tract acoustic interaction , 1987 .

[3]  R. Miller Nature of the Vocal Cord Wave , 1956 .

[4]  J. Perkell,et al.  Glottal airflow and transglottal air pressure measurements for male and female speakers in soft, normal, and loud voice. , 1988, The Journal of the Acoustical Society of America.

[5]  B. Guerin,et al.  A voice source taking account of coupling with the supraglottal cavities , 1976, ICASSP.

[6]  L. Boves,et al.  On subglottal formant analysis. , 1987, The Journal of the Acoustical Society of America.

[7]  A. Gray,et al.  Least squares glottal inverse filtering from the acoustic speech waveform , 1979 .

[8]  A. Rosenberg Effect of glottal pulse shape on the quality of natural vowels. , 1969 .

[9]  Ke Wu,et al.  Quality of speech produced by analysis-synthesis , 1990, Speech Commun..

[10]  Donald G. Childers,et al.  Silent and voiced/unvoiced/mixed excitation (four-way) classification of speech , 1989, IEEE Trans. Acoust. Speech Signal Process..

[11]  Donald G. Childers,et al.  Correction of tape recorder distortion , 1977 .

[12]  J. Holmes,et al.  The influence of glottal waveform on the naturalness of speech from a parallel formant synthesizer , 1973 .

[13]  L. Boves,et al.  On the measurement of glottal flow. , 1988, The Journal of the Acoustical Society of America.

[14]  Martin Rothenberg,et al.  An Interactive Model for the Voice Source , 1983 .

[15]  D. Childers,et al.  Gender recognition from speech. Part I: Coarse analysis. , 1991, The Journal of the Acoustical Society of America.

[16]  D G Childers,et al.  Vocal quality factors: analysis, synthesis, and perception. , 1991, The Journal of the Acoustical Society of America.

[17]  José Carlos Príncipe,et al.  Adaptive WRLS-VFF for speech analysis , 1995, IEEE Trans. Speech Audio Process..

[18]  Inger Karlsson Glottal wave forms for normal female speakers , 1986 .

[19]  Inger Karlsson Voice source dynamics for female speakers , 1990, ICSLP.

[20]  John D. Markel,et al.  An excitation function for LPC synthesis which retains the human Glottal phase characteristics , 1978, ICASSP.

[21]  D G Childers,et al.  Gender recognition from speech. Part II: Fine analysis. , 1991, The Journal of the Acoustical Society of America.

[22]  T. V. Ananthapadmanabha,et al.  Calculation of true glottal flow and its components , 1982, Speech Commun..

[23]  D. Veeneman,et al.  Automatic glottal inverse filtering from speech and electroglottographic signals , 1985, IEEE Trans. Acoust. Speech Signal Process..

[24]  J. Flanagan,et al.  Synthesis of voiced sounds from a two-mass model of the vocal cords , 1972 .

[25]  John N. Holmes,et al.  Formant excitation before and after glottal closure , 1976, ICASSP.

[26]  T. Koizumi,et al.  Glottal source-vocal tract interaction. , 1985, The Journal of the Acoustical Society of America.

[27]  S. Zahorian,et al.  Nonlinear inverse filtering technique for estimating the glottal-area waveform. , 1977, The Journal of the Acoustical Society of America.

[28]  Rolf Carlson,et al.  Experiments with voice modelling in speech synthesis , 1991, Speech Commun..

[29]  D. Klatt,et al.  Analysis, synthesis, and perception of voice quality variations among female and male talkers. , 1990, The Journal of the Acoustical Society of America.

[30]  P. J. Price,et al.  Male and female voice source characteristics: Inverse filtering results , 1989, Speech Commun..

[31]  Inger Karlsson,et al.  Female voices in speech synthesis , 1991 .

[32]  R. B. Monsen,et al.  Study of variations in the male and female glottal wave. , 1976, The Journal of the Acoustical Society of America.

[33]  Paul H. Milenkovic,et al.  Glottal inverse filtering by joint estimation of an AR system with a linear input model , 1986, IEEE Trans. Acoust. Speech Signal Process..

[34]  Inger Karlsson Modelling voice variations in female speech synthesis , 1992, Speech Commun..

[35]  M. Rothenberg A new inverse-filtering technique for deriving the glottal air flow waveform during voicing. , 1970, The Journal of the Acoustical Society of America.

[36]  John S. Bridle,et al.  Interactive digital inverse filtering and its relation to linear prediction methods , 1978, ICASSP.

[37]  D G Childers,et al.  Modeling the glottal volume-velocity waveform for three voice types. , 1995, The Journal of the Acoustical Society of America.

[38]  Dennis H. Klatt Acoustic correlates of breathiness: First harmonic amplitude, turbulence noise, and tracheal coupling , 1987 .

[39]  M. Sondhi,et al.  Measurement of the glottal waveform. , 1975, The Journal of the Acoustical Society of America.