A flow waveform-matched low-dimensional glottal model based on physical knowledge.

The purpose of this study is to explore the possibility for physically based mathematical models of the voice source to accurately reproduce inverse filtered glottal volume-velocity waveforms. A low-dimensional, self-oscillating model of the glottal source with waveform-matching properties is proposed. The model relies on a lumped mechano-aerodynamic scheme loosely inspired by the one- and multimass lumped models. The vocal folds are represented by a single mechanical resonator and a propagation line which takes into account the vertical phase differences. The vocal-fold displacement is coupled to the glottal flow by means of an aerodynamic driving block which includes a general parametric nonlinear component. The principal characteristics of the flow-induced oscillations are retained, and the overall model is able to match inverse-filtered glottal flow signals. The method offers in principle the possibility of performing transformations of the glottal flow by acting on the physiologically based parameters of the model. This is a desirable property, e.g., for speech synthesis applications. The model was tested on a data set which included inverse-filtered glottal flow waveforms of different characteristics. The results demonstrate the possibility of reproducing natural speech waveforms with high accuracy, and of controlling important characteristics of the synthesis such as pitch.

[1]  I. Titze The physics of small-amplitude oscillation of the vocal folds. , 1988, The Journal of the Acoustical Society of America.

[2]  D G Childers,et al.  Speech synthesis by glottal excited linear prediction. , 1994, The Journal of the Acoustical Society of America.

[3]  Man Mohan Sondhi,et al.  A hybrid time-frequency domain articulatory speech synthesizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[4]  I R Titze,et al.  The Human Vocal Cords: A Mathematical Model , 1973, Phonetica.

[5]  I R Titze,et al.  The Human Vocal Cords: A Mathematical Model , 1974, Phonetica.

[6]  J. L. Flanagan,et al.  Acoustic properties of longitudinal displacement in vocal cord vibration , 1977, The Bell System Technical Journal.

[7]  D G Childers,et al.  Modeling the glottal volume-velocity waveform for three voice types. , 1995, The Journal of the Acoustical Society of America.

[8]  J. Flanagan,et al.  Self-oscillating source for vocal-tract synthesizers , 1968 .

[9]  B. Moore,et al.  Suggested formulae for calculating auditory-filter bandwidths and excitation patterns. , 1983, The Journal of the Acoustical Society of America.

[10]  Johan Sundberg,et al.  Simultaneous analysis of vocal fold vibration and transglottal airflow: exploring a new experimental setup. , 2003, Journal of voice : official journal of the Voice Foundation.

[11]  Shang-Liang Chen,et al.  Orthogonal least squares learning algorithm for radial basis function networks , 1991, IEEE Trans. Neural Networks.

[12]  H. Herzel,et al.  Bifurcations in an asymmetric vocal-fold model. , 1995, The Journal of the Acoustical Society of America.

[13]  J. Holmes,et al.  The influence of glottal waveform on the naturalness of speech from a parallel formant synthesizer , 1973 .

[14]  P H Milenkovic Voice source model for continuous control of pitch period. , 1993, The Journal of the Acoustical Society of America.

[15]  van Rr René Hassel,et al.  Theoretical and experimental study of quasisteady‐flow separation within the glottis during phonation. Application to a modified two‐mass model , 1994 .

[16]  Gunnar Fant,et al.  Acoustic Theory Of Speech Production , 1960 .

[17]  H. Strube,et al.  A quasiarticulatory speech synthesizer for German language running in real time , 1989 .

[18]  P. Alku,et al.  A comparison of glottal voice source quantification parameters in breathy, normal and pressed phonation of female and male speakers. , 1996, Folia phoniatrica et logopaedica : official organ of the International Association of Logopedics and Phoniatrics.

[19]  J. Lucero A theoretical study of the hysteresis phenomenon at vocal fold oscillation onset-offset. , 1999, The Journal of the Acoustical Society of America.

[20]  Paavo Alku,et al.  One-delayed-mass model for efficient synthesis of glottal flow , 2001, INTERSPEECH.

[21]  T. Koizumi,et al.  Two-mass models of the vocal cords for natural sounding voice synthesis. , 1987, The Journal of the Acoustical Society of America.

[22]  A Kohlrausch,et al.  A measure for predicting audibility discrimination thresholds for spectral envelope distortions in vowel sounds. , 2001, The Journal of the Acoustical Society of America.

[23]  D. Klatt,et al.  Analysis, synthesis, and perception of voice quality variations among female and male talkers. , 1990, The Journal of the Acoustical Society of America.

[24]  A. Rosenberg Effect of glottal pulse shape on the quality of natural vowels. , 1969 .

[25]  Helmer Strik,et al.  Automatic parametrization of differentiated glottal flow: Comparing methods by means of synthetic flow pulses , 1998 .

[26]  J. Flanagan,et al.  Synthesis of voiced sounds from a two-mass model of the vocal cords , 1972 .

[27]  S. Zahorian,et al.  Nonlinear inverse filtering technique for estimating the glottal-area waveform. , 1977, The Journal of the Acoustical Society of America.

[28]  Sheng Chen,et al.  Representations of non-linear systems: the NARMAX model , 1989 .

[29]  D.G. Childers,et al.  Measuring and modeling vocal source-tract interaction , 1994, IEEE Transactions on Biomedical Engineering.

[30]  Donald G. Childers,et al.  Formant speech synthesis: improving production quality , 1989, IEEE Trans. Acoust. Speech Signal Process..

[31]  H. Strube,et al.  SIM--simultaneous inverse filtering and matching of a glottal flow model for acoustic speech signals. , 2001, The Journal of the Acoustical Society of America.