Factor analysis of vocal-tract outlines derived from real-time magnetic resonance imaging data

A factor analysis of vocal-tract outlines derived automatically from real-time magnetic resonance image (rtMRI) sequences has been performed. The analysis results in a compact representation of vocaltract shapes, where every utterance is represented by a small set of trajectories corresponding to weights in linear combinations of linguistically interpretable vocal-tract deformations. Vocal-tract shapes can be reconstructed with good accuracy from these trajectories. The work uses information from a significantly larger number of speech frames compared to previous attempts in articulatory modeling. The proposed method is illustrated through a case study of rtMRI data corresponding to 250 sentences spoken by a single speaker and underscores the promise of the methodology for phonological analysis and articulatory synthesis.

[1]  Shrikanth Narayanan,et al.  An approach to real-time magnetic resonance imaging for speech production. , 2003, The Journal of the Acoustical Society of America.

[2]  Shrikanth S. Narayanan,et al.  Interaction between general prosodic factors and language-specific articulatory patterns underlies divergent outcomes of coronal stop reduction , 2014 .

[3]  P. Ladefoged,et al.  Factor analysis of tongue shapes. , 1971, Journal of the Acoustical Society of America.

[4]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[5]  Athanasios Katsamanis,et al.  Direct Estimation of Articulatory Kinematics from Real-Time Magnetic Resonance Image Sequences , 2011, INTERSPEECH.

[6]  Mark Hasegawa-Johnson,et al.  Analysis of the three-dimensional tongue shape using a three-index factor analysis model. , 2003, The Journal of the Acoustical Society of America.

[7]  Shinji Maeda,et al.  Compensatory Articulation During Speech: Evidence from the Analysis and Synthesis of Vocal-Tract Shapes Using an Articulatory Model , 1990 .

[8]  Jun Cai,et al.  Articulatory modeling based on semi-polar coordinates and guided PCA technique , 2009, INTERSPEECH.

[9]  Shrikanth Narayanan,et al.  Real-time magnetic resonance imaging and electromagnetic articulography database for speech production research (TC). , 2014, The Journal of the Acoustical Society of America.

[10]  Yoon-Chul Kim,et al.  Seeing speech: Capturing vocal tract shaping using real-time magnetic resonance imaging [Exploratory DSP] , 2008, IEEE Signal Processing Magazine.

[11]  Shrikanth S. Narayanan,et al.  Articulatory synthesis of French connected speech from EMA data , 2013, INTERSPEECH.

[12]  P. Mermelstein Articulatory model for the study of speech production. , 1973, The Journal of the Acoustical Society of America.

[13]  John E. Overall,et al.  Orthogonal Factors and Uncorrelated Factor Scores , 1962 .

[14]  Shrikanth S. Narayanan,et al.  Region Segmentation in the Frequency Domain Applied to Upper Airway Real-Time Magnetic Resonance Images , 2009, IEEE Transactions on Medical Imaging.

[15]  Shrikanth S. Narayanan,et al.  Data-driven analysis of realtime vocal tract MRI using correlated image regions , 2010, INTERSPEECH.

[16]  Shrikanth S. Narayanan,et al.  Truncation of pharyngeal gesture in English diphthong [aɪ] , 2013, INTERSPEECH.