A Study of Mandarin Chinese Using X-Ray and MRI

This paper describes a primary study to establish a dynamic articulatory model by combining MRI technique and X-ray data, where the former is used to refine the detailed real shape of the vocal tract and the latter provides the dynamic information of articulation. In this study, MRI experiments were conducted to obtain 3D static morphologies of 9 single vowels of Mandarin, and the vocal tract shapes were investigated. A set of coefficients of the alpha-beta model has been derived from MRI data. The articulatory movement was obtained from a Mandarin X-ray video (cineradiography) database, which is the only available corpus for Mandarin, and the cross-sectional areas were calculated using the MRI based alpha-beta coefficients. For an evaluation, the formants estimated from the vocal tract area functions of both MRI and X-ray were compared with those obtained from real speech sound. The estimation was consistent with the real speech sound with a mismatch of about 10% and 15%, respectively.

[1]  Paul Mermelstein,et al.  Difference limens for formant frequencies of steady‐state and consonant‐bound vowels , 1976 .

[2]  Gérard Bailly,et al.  A three-dimensional linear articulatory model based on MRI data , 1998, ICSLP.

[3]  K Honda,et al.  Acoustic characteristics of the piriform fossa in models and humans. , 1997, The Journal of the Acoustical Society of America.

[4]  J. Dang,et al.  Acoustic characteristics of the human paranasal sinuses derived from transmission characteristic measurement and morphological observation. , 1996, The Journal of the Acoustical Society of America.

[5]  G. Bailly,et al.  Linear degrees of freedom in speech production: analysis of cineradio- and labio-film data and articulatory-acoustic modeling. , 2001, The Journal of the Acoustical Society of America.

[6]  W. Fitch,et al.  Morphology and development of the human vocal tract: a study using magnetic resonance imaging. , 1999, The Journal of the Acoustical Society of America.

[7]  K. N. Stevens,et al.  Cinéradiographic Studies of Speech: Procedures and Objectives , 1963 .

[8]  R. Daniloff,et al.  Investigation of the timing of velar movements during speech. , 1971, The Journal of the Acoustical Society of America.

[9]  Kenneth N. Stevens,et al.  On the Derivation of Area Functions and Acoustic Spectra from Cinéradiographic Films of Speech , 1964 .

[10]  Shrikanth S. Narayanan,et al.  Toward articulatory-acoustic models for liquid approximants based on MRI and EPG data. Part II. The rhotics. , 1997, The Journal of the Acoustical Society of America.

[11]  J. Dang,et al.  Morphological and acoustical analysis of the nasal and the paranasal cavities. , 1994, The Journal of the Acoustical Society of America.

[12]  C H Shadle,et al.  Frication noise modulated by voicing, as revealed by pitch-scaled decomposition. , 2000, The Journal of the Acoustical Society of America.

[13]  C A Moore,et al.  The correspondence of vocal tract resonance with volumes obtained from magnetic resonance images. , 1992, Journal of speech and hearing research.

[14]  D Kewley-Port,et al.  Auditory models of formant frequency discrimination for isolated vowels. , 1998, The Journal of the Acoustical Society of America.

[15]  Mark K. Tiede,et al.  An MRI study on the relationship between oral cavity shape and larynx position , 1998, ICSLP.

[16]  Cengizhan Ozturk,et al.  MODELLING THE INTERNAL TONGUE USING PRINCIPAL STRAINS , 2000 .

[17]  J. Perkell Physiology of speech production: results and implications of a quantitative cineradiographic study , 1969 .

[18]  P. W. Nye,et al.  Analysis of vocal tract shape and dimensions using magnetic resonance imaging: vowels. , 1991, The Journal of the Acoustical Society of America.

[19]  Brad H Story,et al.  Comparison of magnetic resonance imaging-based vocal tract area functions obtained from the same speaker in 1994 and 2002. , 2008, The Journal of the Acoustical Society of America.

[20]  P. Ladefoged,et al.  Factor analysis of tongue shapes. , 1971, Journal of the Acoustical Society of America.

[21]  T. Chiba The vowel, its nature and structure , 1958 .

[22]  M. Tiede An MRI-based Study of Pharyngeal Volume Contrasts in Akan , 1996 .

[23]  Shinobu Masaki,et al.  Measurement of temporal changes in vocal tract area function from 3D cine-MRI data. , 2006, The Journal of the Acoustical Society of America.

[24]  M J McCutcheon,et al.  MR imaging of the vocal tract during vowel production , 1991, Journal of magnetic resonance imaging : JMRI.

[25]  Olov Engwall,et al.  Vocal Tract Modeling i 3D , 1999 .

[26]  Khalil Iskarous,et al.  Patterns of tongue movement , 2005, J. Phonetics.

[27]  P. Mermelstein Articulatory model for the study of speech production. , 1973, The Journal of the Acoustical Society of America.

[28]  Shinobu Masaki,et al.  MRI-based speech production study using a synchronized sampling method , 1999 .

[29]  Didier Demolin,et al.  Three-dimensional measurement of the vocal tract by MRI , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[30]  E. Hoffman,et al.  Vocal tract area functions from magnetic resonance imaging. , 1996, The Journal of the Acoustical Society of America.

[31]  J. Flanagan A Difference Limen for Vowel Formant Frequency , 1955 .