Dynamic 3-D Visualization of Vocal Tract Shaping During Speech

Noninvasive imaging is widely used in speech research as a means to investigate the shaping and dynamics of the vocal tract during speech production. 3-D dynamic MRI would be a major advance, as it would provide 3-D dynamic visualization of the entire vocal tract. We present a novel method for the creation of 3-D dynamic movies of vocal tract shaping based on the acquisition of 2-D dynamic data from parallel slices and temporal alignment of the image sequences using audio information. Multiple sagittal 2-D real-time movies with synchronized audio recordings are acquired for English vowel-consonant-vowel stimuli /ala/, /ara/, /asa/, and /a∫a/. Audio data are aligned using mel-frequency cepstral coefficients (MFCC) extracted from windowed intervals of the speech signal. Sagittal image sequences acquired from all slices are then aligned using dynamic time warping (DTW). The aligned image sequences enable dynamic 3-D visualization by creating synthesized movies of the moving airway in the coronal planes, visualizing desired tissue surfaces and tube-shaped vocal tract airway after manual segmentation of targeted articulators and smoothing. The resulting volumes allow for dynamic 3-D visualization of salient aspects of lingual articulation, including the formation of tongue grooves and sublingual cavities, with a temporal resolution of 78 ms.

[1]  Eamonn Keogh Exact Indexing of Dynamic Time Warping , 2002, VLDB.

[2]  P. Delattre,et al.  A DIALECT STUDY OF AMERICAN R’S BY X-RAY MOTION PICTURE , 1968 .

[3]  A. H. Rich,et al.  Cross-sectional tongue shape during the production of vowels. , 1988, The Journal of the Acoustical Society of America.

[4]  Raymond D. Kent,et al.  X‐ray microbeam speech production database , 1990 .

[5]  Shrikanth S. Narayanan,et al.  An analysis of vocal tract shaping in English sibilant fricatives using real-time magnetic resonance imaging , 2008, INTERSPEECH.

[6]  Shrikanth S. Narayanan,et al.  Accelerated 3D MRI of vocal tract shaping using compressed sensing and parallel imaging , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Olov Engwall,et al.  Combining MRI, EMA and EPG measurements in a three-dimensional tongue model , 2003, Speech Commun..

[8]  Shrikanth S. Narayanan,et al.  An articulatory study of fricative consonants using magnetic resonance imaging , 1995 .

[9]  Yoon-Chul Kim,et al.  Seeing speech: Capturing vocal tract shaping using real-time magnetic resonance imaging [Exploratory DSP] , 2008, IEEE Signal Processing Magazine.

[10]  E. Hoffman,et al.  Vocal tract area functions from magnetic resonance imaging. , 1996, The Journal of the Acoustical Society of America.

[11]  Shinobu Masaki,et al.  MRI-based speech production study using a synchronized sampling method , 1999 .

[12]  Arne Kjell Foldvik,et al.  A time-evolving three-dimensional vocal tract model by means of magnetic resonance imaging (MRI) , 1993, EUROSPEECH.

[13]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[14]  P. W. Nye,et al.  Analysis of vocal tract shape and dimensions using magnetic resonance imaging: vowels. , 1991, The Journal of the Acoustical Society of America.

[15]  Khalil Iskarous,et al.  Pharyngeal articulation in the production of voiced and voiceless fricatives. , 2010, The Journal of the Acoustical Society of America.

[16]  W S Levine,et al.  Modeling the motion of the internal tongue from tagged cine-MRI images. , 2001, The Journal of the Acoustical Society of America.

[17]  Shrikanth S. Narayanan,et al.  Accelerated three‐dimensional upper airway MRI using compressed sensing , 2009, Magnetic resonance in medicine.

[18]  Eamonn J. Keogh,et al.  Iterative Deepening Dynamic Time Warping for Time Series , 2002, SDM.

[19]  Gérard Bailly,et al.  Three-dimensional linear articulatory modeling of tongue, lips and face, based on MRI and video images , 2002, J. Phonetics.

[20]  Maureen Stone,et al.  Dynamic programming method for temporal registration of three-dimensional tongue surface motion from multiple utterances , 2002, Speech Commun..

[21]  F. Itakura,et al.  Minimum prediction residual principle applied to speech recognition , 1975 .

[22]  Shrikanth Narayanan,et al.  Synchronized and noise-robust audio recordings during realtime magnetic resonance imaging scans. , 2006, The Journal of the Acoustical Society of America.

[23]  Shrikanth Narayanan,et al.  An approach to real-time magnetic resonance imaging for speech production. , 2003, The Journal of the Acoustical Society of America.

[24]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[25]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[26]  Philip Chan,et al.  Toward accurate dynamic time warping in linear time and space , 2007, Intell. Data Anal..

[27]  P Perrier,et al.  Vocal tract area function estimation from midsagittal dimensions with CT scans and a vocal tract cast: modeling the transition with two sets of coefficients. , 1992, Journal of speech and hearing research.

[28]  Shinobu Masaki,et al.  Measurement of temporal changes in vocal tract area function from 3D cine-MRI data. , 2006, The Journal of the Acoustical Society of America.

[29]  Shrikanth S. Narayanan,et al.  A study of emotional speech articulation using a fast magnetic resonance imaging technique , 2006, INTERSPEECH.

[30]  Sidney A J Wood,et al.  X-ray and model studies of vowel articulation , 1982 .

[31]  M H Cohen,et al.  Electromagnetic midsagittal articulometer systems for transducing speech articulatory movements. , 1992, The Journal of the Acoustical Society of America.

[32]  Shrikanth S. Narayanan,et al.  Toward articulatory-acoustic models for liquid approximants based on MRI and EPG data. Part I. The laterals , 1997 .