Some problems in voice source analysis

This is an overview of some recent studies of voice source acoustics and glottal flow analysis and modelling performed at the KTH. Time and frequency domain aspects of the production process are discussed with a view of relating glottal flow parameters from inverse filtering and vocal tract transfer functions to formant amplitudes and bandwidths. Alternative methods of determining the time constant Ta = 1(2πFa) in the return phase of glottal flow derivative after the instant of excitation, and thus of spectral tilt, are discussed. Selective inverse filtering, removing all but one formant, is potentially useful for this purpose. The influence of uncertainties in quantifying the vocal tract transfer function is exemplified by a calculation of the effects of introducing a finite baffle effect of the human head adding a high-frequency emphasis above the standard + 6 dB/octave. Particular attention has been paid to temporal variations within an utterance as derived from continuous inverse filtering. Aspects of breathy voicing and female-male differences in voice production are discussed. It is demonstrated that the temporal profile of the excitation amplitude, Ee(t), within an utterance derived from a male speaker can be approximated by the envelope of the negative part of the speech wave.

[1]  Gunnar Fant,et al.  Acoustic Theory Of Speech Production , 1960 .

[2]  Q. Lin,et al.  An articulatory speech synthesizer based on a frequency-domain simulation of the vocal tract , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Christer Gobl,et al.  Voice source rules for text-to-speech synthesis , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[4]  J. Flanagan Speech Analysis, Synthesis and Perception , 1971 .

[5]  Lou Boves,et al.  On the relation between voice source parameters and prosodic features in connected speech , 1992, Speech Commun..

[6]  K. Stevens,et al.  Effects of a vocal-tract constriction on the glottal source: experimental and modelling studies , 1986 .

[7]  Hk Schutte,et al.  The effect of F0/F1 coincidence in soprano high notes on pressure at the glottis , 1986 .

[8]  Qiguang Lin,et al.  Glottal source‐vocal tract acoustic interaction , 1987 .

[9]  M. Rothenberg A new inverse-filtering technique for deriving the glottal air flow waveform during voicing. , 1970, The Journal of the Acoustical Society of America.

[10]  Gunnar Fant,et al.  Glottal flow: models and interaction , 1986 .

[11]  Inger Karlsson Voice source dynamics for female speakers , 1990, ICSLP.

[12]  J. L. Flanagan,et al.  Synthesis of speech from a dynamic model of the vocal cords and vocal tract , 1975, The Bell System Technical Journal.

[13]  T. V. Ananthapadmanabha,et al.  Calculation of true glottal flow and its components , 1982, Speech Commun..

[14]  Johan Liljencrants,et al.  Formant‐Amplitude Measurements , 1963 .

[15]  Rolf Carlson,et al.  Experiments with voice modelling in speech synthesis , 1991, Speech Commun..

[16]  D. Klatt,et al.  Analysis, synthesis, and perception of voice quality variations among female and male talkers. , 1990, The Journal of the Acoustical Society of America.

[17]  Lennart Nord,et al.  Prosodic and segmental speaker variations , 1991, Speech Commun..

[18]  Lennart Nord,et al.  Perceptual tests using an interactive source filter model and considerations for synthesis strategies , 1986 .

[19]  Hiroya Fujisaki,et al.  Estimation of voice source and vocal tract parameters based on ARMA analysis and a model for the Glottal source waveform , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20]  Gunnar Fant,et al.  Acoustic analysis and synthesis of speech with applications to Swedish , 1959 .

[21]  Qiguang Lin Speech production theory and articulatory speech synthesis , 1991 .

[22]  Hiroya Fujisaki,et al.  Proposal and evaluation of models for the glottal source waveform , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.