A magnetic resonance imaging-based articulatory and acoustic study of "retroflex" and "bunched" American English /r/.

Speakers of rhotic dialects of North American English show a range of different tongue configurations for /r/. These variants produce acoustic profiles that are indistinguishable for the first three formants [Delattre, P., and Freeman, D. C., (1968). "A dialect study of American English r's by x-ray motion picture," Linguistics 44, 28-69; Westbury, J. R. et al. (1998), "Differences among speakers in lingual articulation for American English /r/," Speech Commun. 26, 203-206]. It is puzzling why this should be so, given the very different vocal tract configurations involved. In this paper, two subjects whose productions of "retroflex" /r/ and "bunched" /r/ show similar patterns of F1-F3 but very different spacing between F4 and F5 are contrasted. Using finite element analysis and area functions based on magnetic resonance images of the vocal tract for sustained productions, the results of computer vocal tract models are compared to actual speech recordings. In particular, formant-cavity affiliations are explored using formant sensitivity functions and vocal tract simple-tube models. The difference in F4/F5 patterns between the subjects is confirmed for several additional subjects with retroflex and bunched vocal tract configurations. The results suggest that the F4/F5 differences between the variants can be largely explained by differences in whether the long cavity behind the palatal constriction acts as a half- or a quarter-wavelength resonator.

[1]  I. Lehiste ACOUSTICAL CHARACTERISTICS OF SELECTED ENGLISH CONSONANTS , 1965 .

[2]  P. W. Nye,et al.  Analysis of vocal tract shape and dimensions using magnetic resonance imaging: vowels. , 1991, The Journal of the Acoustical Society of America.

[3]  Abeer Alwan,et al.  Acoustic modelling of American English /r/ , 1997, EUROSPEECH.

[4]  Robert Hagiwara WPP, No. 90: Acoustic Realizations of American /r/ as Produced by Women and Men , 1995 .

[5]  藤村 靖,et al.  Gunnar Fant: Acoustic Theory of Speech Production : with Calculations based on X-Ray Studies of Russian Articulations, Mouton & Co, 1960, 's-Gravenhage $ 15 , 1962 .

[6]  A. Liberman,et al.  Acoustic Cues for the Perception of Initial /w, j, r, l/ in English , 1957 .

[7]  Zhaoyan Zhang,et al.  VTAR: A Matlab-based computer program for vocal tract acoustic modeling , 2004 .

[8]  Shinobu Masaki,et al.  Measurement of temporal changes in vocal tract area function from 3D cine-MRI data. , 2006, The Journal of the Acoustical Society of America.

[9]  K. Honda,et al.  Cyclicity of laryngeal cavity resonance due to vocal fold vibration. , 2006, The Journal of the Acoustical Society of America.

[10]  Mark K. Tiede,et al.  Modeling of the front cavity and sublingual space in American English rhotic sounds , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[11]  Nobuhiro Miki,et al.  Transfer function of 3-D vocal tract model with higher mode , 1996 .

[12]  Kiyoshi Honda,et al.  Acoustic roles of the laryngeal cavity in vocal tract resonance. , 2006, The Journal of the Acoustical Society of America.

[13]  M M Sondhi Resonances of a bent vocal tract. , 1986, The Journal of the Acoustical Society of America.

[14]  Simon King,et al.  Speech production knowledge in automatic speech recognition. , 2007, The Journal of the Acoustical Society of America.

[15]  G. Fant Acoustic theory of speech production : with calculations based on X-ray studies of Russian articulations , 1961 .

[16]  E. Hoffman,et al.  Vocal tract area functions from magnetic resonance imaging. , 1996, The Journal of the Acoustical Society of America.

[17]  Mary J. Lindstrom,et al.  Differences among speakers in lingual articulation for American English /r/ , 1998, Speech Commun..

[18]  C Y Espy-Wilson,et al.  Articulatory tradeoffs reduce acoustic variability during American English /r/ production. , 1999, The Journal of the Acoustical Society of America.

[19]  Nobuhiro Miki,et al.  3D finite element analysis of Japanese vowels in elliptic sound tube model , 2000 .

[20]  P. Delattre,et al.  A DIALECT STUDY OF AMERICAN R’S BY X-RAY MOTION PICTURE , 1968 .

[21]  Brad H Story,et al.  Technique for "tuning" vocal tract area functions based on acoustic sensitivity functions. , 2006, The Journal of the Acoustical Society of America.

[22]  Shrikanth S. Narayanan,et al.  Toward articulatory-acoustic models for liquid approximants based on MRI and EPG data. Part I. The laterals , 1997 .

[23]  R M Dalston,et al.  Acoustic characteristics of English /w,r,l/ spoken correctly by young children and adults. , 1975, The Journal of the Acoustical Society of America.

[24]  Kunitoshi Motoki,et al.  Three-dimensional acoustic field in vocal-tract , 2002 .

[25]  Mohamad Mrayati,et al.  Distinctive regions and modes: a new theory of speech production , 1988, Speech Commun..

[26]  Mark Hasegawa-Johnson,et al.  Landmark-based speech recognition: report of the 2004 Johns Hopkins summer workshop , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[27]  Nobuhiro Miki,et al.  FEM analysis of sound wave propagation in the vocal tract with 3-D radiational model , 1996 .

[28]  T. J. Thomas A finite element model of fluid flow in the vocal tract , 1986 .

[29]  Kenneth N. Stevens,et al.  On the Derivation of Area Functions and Acoustic Spectra from Cinéradiographic Films of Speech , 1964 .

[30]  L. Lisker Minimal Cues for Separating /w, r, l, y/ in Intervocalic Position , 1957 .

[31]  Carol Y. Espy-Wilson ARTICULATORY STRATEGIES, SPEECH ACOUSTICS AND VARIABILITY , 2004 .

[32]  C. Espy-Wilson,et al.  The relevance of F4 in distinguishing between different articulatory configurations of American English /r/ , 1999 .

[33]  K Honda,et al.  Acoustic characteristics of the piriform fossa in models and humans. , 1997, The Journal of the Acoustical Society of America.

[34]  Suzanne Boyce,et al.  A new taxonomy of American English /r/ using MRI and ultrasound , 2004 .

[35]  Carol Espy-Wilson,et al.  A probabilistic framework for landmark detection based on phonetic features for automatic speech recognition. , 2008, The Journal of the Acoustical Society of America.