Truncation of pharyngeal gesture in English diphthong [aɪ]

It is well acknowledged that [a] in English diphthongs (e.g. [a] in “pie’d”) has a different formant structure from its closest corresponding monophthong (e.g. [a] in “pod”). The current study proposes that these two sounds share the same cognitive unit, i.e. the pharyngeal constriction gesture that produces [a], and the surface difference can be modeled as a consequence of truncating the same articulatory movement in time by the following palatal glide in the diphthongal environment. Formation of pharyngeal constriction gesture during the production of [a] in a diphthong and in its corresponding monophthong was observed in various timing contexts using Realtime MRI; and the collected production data were quantitatively analyzed using the direct image analysis (DIA) technique, which infers tissue movement by tracking pixel intensity change over time in regions of interest. Results support our truncation account in that: (1) formation time of pharyngeal constriction is significantly longer in monophthongs than in diphthongs; (2) this duration correlates with the resulting constriction degree; and (3) the resulting constriction degree predicts the acoustic difference in the F2 dimension as predicted by our hypothesis.

[1]  Shrikanth Narayanan,et al.  Synchronized and noise-robust audio recordings during realtime magnetic resonance imaging scans. , 2006, The Journal of the Acoustical Society of America.

[2]  Athanasios Katsamanis,et al.  A Multimodal Real-Time MRI Articulatory Corpus for Speech Research , 2011, INTERSPEECH.

[3]  Shrikanth S. Narayanan,et al.  Data-driven analysis of realtime vocal tract MRI using correlated image regions , 2010, INTERSPEECH.

[4]  Andrew W. Moore,et al.  Locally Weighted Learning , 1997, Artificial Intelligence Review.

[5]  Athanasios Katsamanis,et al.  Statistical multi-stream modeling of real-time MRI articulatory speech data , 2010, INTERSPEECH.

[6]  Shrikanth S. Narayanan,et al.  Region Segmentation in the Frequency Domain Applied to Upper Airway Real-Time Magnetic Resonance Images , 2009, IEEE Transactions on Medical Imaging.

[7]  David Britain,et al.  Dialect contact and phonological reallocation: “Canadian Raising” in the English Fens , 1997, Language in Society.

[8]  G. E. Peterson,et al.  Transitions, Glides, and Diphthongs , 1961 .

[9]  Louis Goldstein,et al.  Dynamics and articulatory phonology , 1996 .

[10]  Colin W. Wightman,et al.  Segmental durations in the vicinity of prosodic phrase boundaries. , 1992, The Journal of the Acoustical Society of America.

[11]  G. Fairbanks,et al.  Diphthong formants and their movements. , 1962, Journal of speech and hearing research.

[12]  Shrikanth Narayanan,et al.  An approach to real-time magnetic resonance imaging for speech production. , 2003, The Journal of the Acoustical Society of America.

[13]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[14]  M H Cohen,et al.  Electromagnetic midsagittal articulometer systems for transducing speech articulatory movements. , 1992, The Journal of the Acoustical Society of America.

[15]  Athanasios Katsamanis,et al.  Direct Estimation of Articulatory Kinematics from Real-Time Magnetic Resonance Image Sequences , 2011, INTERSPEECH.

[16]  Daniel Jones An outline of English phonetics , 1956 .

[17]  Athanasios Katsamanis,et al.  Rapid semi-automatic segmentation of real-time magnetic resonance images for parametric vocal tract analysis , 2010, INTERSPEECH.

[18]  Raymond D. Kent,et al.  X‐ray microbeam speech production database , 1990 .

[19]  Shrikanth Narayanan,et al.  Automatic analysis of constriction location in singleton and geminate consonant articulation using real-time magnetic resonance imaging , 2011 .