Single-matrix formulation of a time domain acoustic model of the vocal tract with side branches

Although it has been found that the piriform fossae play an important role in speech production and acoustics, the popular time domain articulatory synthesizer of [Maeda, S., 1982. A digital simulation method of the vocal-tract system. Speech Comm. 1 (3-4), 199-229] currently cannot include any more than one side branch to the acoustic tube that represents the main vocal tract. To overcome this limitation, in this paper we extended Maeda's (1982) simulation method, by mathematical reformulation in terms of a single-matrix equation having a system matrix that is both sparse and symmetric. Using vocal tract area functions measured by MRI, the simulation results showed that the piriform fossae suppress the energy in the higher frequencies by introducing spectral zeros around 4-5kHz, and also tend to lower the second formant of vowels. These spectral changes agree with results produced using a well-tested frequency domain transmission-line method, thus validating our new formulation of the time domain synthesizer. The reformulation can be easily extended to accommodate any number of vocal tract side branches, thus enabling more realistic, physiologically correct acoustic simulation of speech production.

[1]  M T Jackson,et al.  Verifying a vocal tract model with a closed side-branch. , 2001, The Journal of the Acoustical Society of America.

[2]  Gunnar Fant,et al.  Acoustic Theory Of Speech Production , 1960 .

[3]  Shinobu Masaki,et al.  Measurement of temporal changes in vocal tract area function from 3D cine-MRI data. , 2006, The Journal of the Acoustical Society of America.

[4]  Kiyoshi Honda,et al.  Exploring Human Speech Production Mechanisms by MRI , 2004, IEICE Trans. Inf. Syst..

[5]  J. Flanagan Speech Analysis, Synthesis and Perception , 1971 .

[6]  Kiyoshi Honda,et al.  Principal components of vocal-tract area functions and inversion of vowels by linear regression of cepstrum coefficients , 2007, J. Phonetics.

[7]  J. Dang,et al.  Morphological and acoustical analysis of the nasal and the paranasal cavities. , 1994, The Journal of the Acoustical Society of America.

[8]  Robert I. Damper,et al.  Prospects for articulatory synthesis: A position paper , 2001, SSW.

[9]  J. L. Flanagan,et al.  Synthesis of speech from a dynamic model of the vocal cords and vocal tract , 1975, The Bell System Technical Journal.

[10]  J. Flanagan,et al.  Synthesis of voiced sounds from a two-mass model of the vocal cords , 1972 .

[11]  Man Mohan Sondhi,et al.  A hybrid time-frequency domain articulatory speech synthesizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[12]  Kiyoshi Honda,et al.  Acoustic roles of the laryngeal cavity in vocal tract resonance. , 2006, The Journal of the Acoustical Society of America.

[13]  Toshio Hirai Optimization of target cost weights in concatenative speech synthesis with very short segments of 5‐ms duration , 2006 .

[14]  C. H. Coker,et al.  Synthetic voices for computers , 1970, IEEE Spectrum.

[15]  K Honda,et al.  Acoustic characteristics of the piriform fossa in models and humans. , 1997, The Journal of the Acoustical Society of America.

[16]  Peter Birkholz,et al.  Influence of temporal discretization schemes on formant frequencies and bandwidths in time domain simulations of the vocal tract system , 2004, INTERSPEECH.

[17]  Peter Birkholz,et al.  Construction And Control Of A Three-Dimensional Vocal Tract Model , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[18]  Toshio Hirai,et al.  An MRI‐based time‐domain speech synthesis system , 2006 .

[19]  Shinji Maeda,et al.  A digital simulation method of the vocal-tract system , 1982, Speech Commun..

[20]  Olov Engwall,et al.  Combining MRI, EMA and EPG measurements in a three-dimensional tongue model , 2003, Speech Commun..

[21]  Kiyoshi Honda,et al.  Individual variation of the hypopharyngeal cavities and its acoustic effects , 2005 .

[22]  S Adachi,et al.  An acoustical study of sound production in biphonic singing, Xöömij. , 1999, The Journal of the Acoustical Society of America.