Advances in real-time magnetic resonance imaging of the vocal tract for speech science and technology research

Real-time magnetic resonance imaging (rtMRI) of the moving vocal tract during running speech production is an important emerging tool for speech production research providing dynamic information of a speaker's upper airway from the entire mid-sagittal plane or any other scan plane of interest. There have been several advances in the development of speech rtMRI and corresponding analysis tools, and their application to domains such as phonetics and phonological theory, articulatory modeling, and speaker characterization. An important recent development has been the open release of a database that includes speech rtMRI data from five male and five female speakers of American English each producing 460 phonetically balanced sentences. The purpose of the present paper is to give an overview and outlook of the advances in rtMRI as a tool for speech research and technology development.

[1]  Roland Bammer,et al.  Parallel imaging reconstruction for arbitrary trajectories using k‐space sparse matrices (kSPA) , 2007, Magnetic resonance in medicine.

[2]  Elliot Saltzman,et al.  Articulatory Information for Noise Robust Speech Recognition , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Keith Johnson,et al.  Partial Compensation for Altered Auditory Feedback: A Tradeoff with Somatosensory Feedback? , 2012, Language and speech.

[4]  Eric Vatikiotis-Bateson,et al.  The Haskins optically corrected ultrasound system (HOCUS). , 2005, Journal of speech, language, and hearing research : JSLHR.

[5]  Shrikanth S. Narayanan,et al.  Toward articulatory-acoustic models for liquid approximants based on MRI and EPG data. Part II. The rhotics. , 1997, The Journal of the Acoustical Society of America.

[6]  Shrikanth S. Narayanan,et al.  Improved imaging of lingual articulation using real‐time multislice MRI , 2012, Journal of magnetic resonance imaging : JMRI.

[7]  Yoon-Chul Kim,et al.  Seeing speech: Capturing vocal tract shaping using real-time magnetic resonance imaging [Exploratory DSP] , 2008, IEEE Signal Processing Magazine.

[8]  J. Brunberg,et al.  Adult claustrophobia, anxiety and sedation in MRI. , 1997, Magnetic resonance imaging.

[9]  Jamie L Perry,et al.  Variations in Velopharyngeal Structures between Upright and Supine Positions Using Upright Magnetic Resonance Imaging , 2011, The Cleft palate-craniofacial journal : official publication of the American Cleft Palate-Craniofacial Association.

[10]  Jens Frahm,et al.  Real‐time MRI at a resolution of 20 ms , 2010, NMR in biomedicine.

[11]  P. Delattre Pharyngeal Features in the Consonants of Arabic, German, Spanish, French, and American English , 1971 .

[12]  R. Krakow NONSEGMENTAL INFLUENCES ON VELUM MOVEMENT PATTERNS: SYLLABLES, SENTENCES, STRESS, AND SPEAKING RATE , 1993 .

[13]  Jens Frahm,et al.  Real‐time magnetic resonance imaging of normal swallowing , 2012, Journal of magnetic resonance imaging : JMRI.

[14]  Shrikanth S. Narayanan,et al.  Automatic Classification of Palatal and Pharyngeal Wall Shape Categories from Speech Acoustics and Inverted Articulatory Signals , 2013 .

[15]  Shinobu Masaki,et al.  Difference in vocal tract shape between upright and supine postures: Observations by an open-type MRI scanner , 2005 .

[16]  Daniel R. Lametti,et al.  Sensory Preference in Speech Production Revealed by Simultaneous Alteration of Auditory and Somatosensory Feedback , 2012, The Journal of Neuroscience.

[17]  A. Macovski,et al.  Selection of a convolution function for Fourier inversion using gridding [computerised tomography application]. , 1991, IEEE transactions on medical imaging.

[18]  R. Boubertakh,et al.  Towards clinical assessment of velopharyngeal closure using MRI: evaluation of real-time MRI sequences at 1.5 and 3 T. , 2012, The British journal of radiology.

[19]  Haizhou Li,et al.  An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[20]  Shrikanth Narayanan,et al.  An approach to real-time magnetic resonance imaging for speech production. , 2003, The Journal of the Acoustical Society of America.

[21]  Shrikanth S. Narayanan,et al.  Vocal tract cross-distance estimation from real-time MRI using region-of-interest analysis , 2013, INTERSPEECH.

[22]  Shrikanth S. Narayanan,et al.  Accelerated three‐dimensional upper airway MRI using compressed sensing , 2009, Magnetic resonance in medicine.

[23]  Shinji Maeda,et al.  Compensatory Articulation During Speech: Evidence from the Analysis and Synthesis of Vocal-Tract Shapes Using an Articulatory Model , 1990 .

[24]  Zhi-Pei Liang,et al.  Dynamic imaging of speech and swallowing with MRI , 2009, 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[25]  M Stone,et al.  Comparison of speech production in upright and supine position. , 2007, The Journal of the Acoustical Society of America.

[26]  A. Alwan,et al.  Toward articulatory-acoustic models for liquid approximants based on MRI and EPG data. Part I. The laterals. , 1997, The Journal of the Acoustical Society of America.

[27]  Shrikanth S. Narayanan,et al.  Toward automatic vocal tract area function estimation from accelerated three-dimensional magnetic resonance imaging , 2013 .

[28]  Shrikanth Narayanan,et al.  Morphological variation in the adult hard palate and posterior pharyngeal wall. , 2013, Journal of speech, language, and hearing research : JSLHR.

[29]  Shrikanth Narayanan,et al.  A generalized smoothness criterion for acoustic-to-articulatory inversion. , 2010, The Journal of the Acoustical Society of America.

[30]  Jens Frahm,et al.  High-speed real-time magnetic resonance imaging of fast tongue movements in elite horn players. , 2015, Quantitative imaging in medicine and surgery.

[31]  Nobuhiko Hata,et al.  Dynamic imaging of swallowing in a seated position using open‐configuration MRI , 2007, Journal of magnetic resonance imaging : JMRI.

[32]  Shrikanth S. Narayanan,et al.  A two-step technique for MRI audio enhancement using dictionary learning and wavelet packet analysis , 2013, INTERSPEECH.

[33]  Shrikanth S. Narayanan,et al.  Region Segmentation in the Frequency Domain Applied to Upper Airway Real-Time Magnetic Resonance Images , 2009, IEEE Transactions on Medical Imaging.

[34]  Louis Goldstein,et al.  Towards an articulatory phonology , 1986, Phonology.

[35]  Marc E Miquel,et al.  Recommendations for real‐time speech MRI , 2016, Journal of magnetic resonance imaging : JMRI.

[36]  B. Atal,et al.  Inversion of articulatory-to-acoustic transformation in the vocal tract by a computer-sorting technique. , 1978, The Journal of the Acoustical Society of America.

[37]  Jens Frahm,et al.  Real‐time MRI of speaking at a resolution of 33 ms: Undersampled radial FLASH with nonlinear inverse reconstruction , 2013, Magnetic resonance in medicine.

[38]  Shrikanth S. Narayanan,et al.  Statistical methods for estimation of direct and differential kinematics of the vocal tract , 2013, Speech Commun..

[39]  J.M. Santos,et al.  Flexible real-time magnetic resonance imaging framework , 2004, The 26th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[40]  Kiyoshi Honda,et al.  A method of tooth superimposition on MRI data for accurate measurement of vocal tract shape and dimensions , 2004 .

[41]  Sidney A J Wood,et al.  X-ray and model studies of vowel articulation , 1982 .

[42]  P. Schönle,et al.  Electromagnetic articulography: Use of alternating magnetic fields for tracking movements of multiple points inside and outside the vocal tract , 1987, Brain and Language.

[43]  L. Goldstein,et al.  Articulatory Phonology: A phonology for public language use , 2003 .

[44]  Ron Kalin,et al.  Current Clinical Issues for MRI Scanning of Pacemaker and Defibrillator Patients , 2005, Pacing and clinical electrophysiology : PACE.

[45]  Douglas A. Reynolds,et al.  The NIST 2014 Speaker Recognition i-vector Machine Learning Challenge , 2014, Odyssey.

[46]  Shrikanth Narayanan,et al.  Automatic speech recognition using articulatory features from subject-independent acoustic-to-articulatory inversion. , 2011, The Journal of the Acoustical Society of America.

[47]  Richard S McGowan,et al.  Analyses of vocal tract cross-distance to area mapping: an investigation of a set of vowel images. , 2012, The Journal of the Acoustical Society of America.

[48]  Shrikanth S. Narayanan,et al.  Enhanced airway-tissue boundary segmentation for real-time magnetic resonance imaging data , 2014 .

[49]  Shrikanth Narayanan,et al.  Synchronized and noise-robust audio recordings during realtime magnetic resonance imaging scans. , 2006, The Journal of the Acoustical Society of America.

[50]  Shrikanth S. Narayanan,et al.  Timing effects of syllable structure and stress on nasals: A real-time MRI examination , 2009, J. Phonetics.

[51]  Raymond D. Kent,et al.  X‐ray microbeam speech production database , 1990 .

[52]  Shrikanth S. Narayanan,et al.  Velic coordination in French nasals: a real-time magnetic resonance imaging study , 2013, INTERSPEECH.

[53]  Joseph P. Campbell,et al.  Human error rates for speaker recognition , 2011 .

[54]  Shrikanth S. Narayanan,et al.  Articulatory synthesis of French connected speech from EMA data , 2013, INTERSPEECH.

[55]  E. Hoffman,et al.  Vocal tract area functions from magnetic resonance imaging. , 1996, The Journal of the Acoustical Society of America.

[56]  Didier Demolin,et al.  Mid-sagittal cut to area function transformations: Direct measurements of mid-sagittal distance and area with MRI , 2002, Speech Commun..

[57]  Shrikanth S. Narayanan,et al.  Stable articulatory tasks and their variable formation: tamil retroflex consonants , 2013, INTERSPEECH.

[58]  Marzena Wylezinska,et al.  Speech MRI: morphology and function. , 2014, Physica medica : PM : an international journal devoted to the applications of physics to medicine and biology : official journal of the Italian Association of Biomedical Physics.

[59]  Society of magnetic resonance in medicine , 1990 .

[60]  Phil Rose,et al.  Technical forensic speaker recognition: Evaluation, types and testing of evidence , 2006, Comput. Speech Lang..

[61]  K. Moll,et al.  Cinefluorographic Study of Selected Allophones of English /I/ , 1975, Phonetica.

[62]  Shrikanth S. Narayanan,et al.  Data-driven analysis of realtime vocal tract MRI using correlated image regions , 2010, INTERSPEECH.

[63]  Shrikanth S. Narayanan,et al.  Flexible retrospective selection of temporal resolution in real‐time speech MRI using a golden‐ratio spiral view order , 2011, Magnetic resonance in medicine.

[64]  Shinobu Masaki,et al.  Measurement of temporal changes in vocal tract area function from 3D cine-MRI data. , 2006, The Journal of the Acoustical Society of America.

[65]  M. Iacoboni,et al.  Listening to speech activates motor areas involved in speech production , 2004, Nature Neuroscience.

[66]  W S Levine,et al.  Modeling the motion of the internal tongue from tagged cine-MRI images. , 2001, The Journal of the Acoustical Society of America.

[67]  M. Stone Imaging and Measurement of the Vocal Tract , 2006 .

[68]  Walt Detmar Meurers,et al.  Encyclopedia of Language and Linguistics , 2006 .

[69]  Louis Goldstein,et al.  Automatic Analysis of Singleton and Geminate Consonant Articulation Using Real-Time Magnetic Resonance Imaging , 2011, INTERSPEECH.

[70]  D. Childers,et al.  A critical review of electroglottography. , 1985, Critical reviews in biomedical engineering.

[71]  Shinji Maeda,et al.  A digital simulation method of the vocal-tract system , 1982, Speech Commun..

[72]  Vincent M. Stanford,et al.  Effects of the New Testing Paradigm of the 2012 NIST Speaker Recognition Evaluation , 2014, The Speaker and Language Recognition Workshop.

[73]  Bernhard Richter,et al.  Weight-Bearing MR Imaging as an Option in the Study of Gravitational Effects on the Vocal Tract of Untrained Subjects in Singing Phonation , 2014, PloS one.

[74]  Antje S. Meyer,et al.  Phonetics and Phonology in Language Comprehension and Production: Differences and Similarities , 2003, Phonetica.

[75]  W. Hardcastle,et al.  New developments in electropalatography: A state-of-the-art report , 1989 .

[76]  Shrikanth S. Narayanan,et al.  Emphatic segments and emphasis spread in Lebanese Arabic: a Real-time Magnetic Resonance Imaging Study , 2012, INTERSPEECH.

[77]  Shrikanth Narayanan,et al.  Real-time magnetic resonance imaging and electromagnetic articulography database for speech production research (TC). , 2014, The Journal of the Acoustical Society of America.

[78]  Shrikanth Narayanan,et al.  Interspeaker variability in hard palate morphology and vowel production. , 2013, Journal of speech, language, and hearing research : JSLHR.

[79]  Athanasios Katsamanis,et al.  Direct Estimation of Articulatory Kinematics from Real-Time Magnetic Resonance Image Sequences , 2011, INTERSPEECH.

[80]  I W Ng,et al.  Application of MRI movie for observation of articulatory movement during a fricative /s/ and a plosive /t/. , 2011, The Angle orthodontist.

[81]  Y. Tohkura,et al.  X‐ray film database for speech research , 1994 .

[82]  Athanasios Katsamanis,et al.  Rapid semi-automatic segmentation of real-time magnetic resonance images for parametric vocal tract analysis , 2010, INTERSPEECH.

[83]  Simon King,et al.  Speech production knowledge in automatic speech recognition. , 2007, The Journal of the Acoustical Society of America.

[84]  Shrikanth Narayanan,et al.  Are Articulatory Settings Mechanically Advantageous for Speech Motor Control? , 2014, PloS one.

[85]  B. Lindblom,et al.  Role of articulation in speech perception: clues from production. , 1996, The Journal of the Acoustical Society of America.

[86]  Prasanta Kumar Ghosh,et al.  Processing speech signal using auditory-like filterbank provides least uncertainty about articulatory gestures. , 2011, The Journal of the Acoustical Society of America.