Analysis of speech production real-time MRI

Abstract Recent advances in real-time magnetic resonance imaging (RT-MRI) have made it possible to study the anatomy and dynamic motion of the vocal tract during speech production with great detail. The abundance of rich data on speech articulation provided by medical imaging techniques affords new opportunities for speech science, linguistics, clinical and technological research and application development, but also presents new challenges in audio–video data analysis and data modeling. We review techniques used in analysis of articulatory data acquired using RT-MRI, and assess the utility of different approaches for different types of data and research goals.

[1]  P. Toutouzas,et al.  A Magnetic Resonance Imaging Study , 2003 .

[2]  Olov Engwall From real-time MRI to 3d tongue movements , 2004, INTERSPEECH.

[3]  M H Cohen,et al.  Electromagnetic midsagittal articulometer systems for transducing speech articulatory movements. , 1992, The Journal of the Acoustical Society of America.

[4]  M Stone,et al.  A head and transducer support system for making ultrasound images of tongue/jaw movement. , 1995, The Journal of the Acoustical Society of America.

[5]  Jorge Baptista,et al.  Computational Processing of the Portuguese Language , 2012, Lecture Notes in Computer Science.

[6]  C. C. Goodyear,et al.  Measurements of vocal tract shapes using magnetic resonance imaging , 1992 .

[7]  Florian Metze,et al.  A flexible stream architecture for ASR using articulatory features , 2002, INTERSPEECH.

[8]  Rafael De Assuncao Sampaio,et al.  Vocal Tract Morphology Using Real-Time Magnetic Resonance Imaging , 2017, 2017 30th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI).

[9]  Dani Byrd,et al.  The elastic phrase: modeling the dynamics of boundary-adjacent lengthening , 2003, J. Phonetics.

[10]  Jens Frahm,et al.  High-speed real-time magnetic resonance imaging of fast tongue movements in elite horn players. , 2015, Quantitative imaging in medicine and surgery.

[11]  Marie-Odile Berger,et al.  A guided approach for automatic segmentation and modeling of the vocal tract in MRI images , 2011, 2011 19th European Signal Processing Conference.

[12]  Shrikanth Narayanan,et al.  Are Articulatory Settings Mechanically Advantageous for Speech Motor Control? , 2014, PloS one.

[13]  Stefanie Wuhrer,et al.  A hybrid approach to 3d tongue modeling from vocal tract MRI using unsupervised image segmentation and mesh deformation , 2014, INTERSPEECH.

[14]  Tao Li,et al.  The Relationships Among Various Nonnegative Matrix Factorization Methods for Clustering , 2006, Sixth International Conference on Data Mining (ICDM'06).

[15]  Chunming Li,et al.  Distance Regularized Level Set Evolution and Its Application to Image Segmentation , 2010, IEEE Transactions on Image Processing.

[16]  P. Mermelstein Articulatory model for the study of speech production. , 1973, The Journal of the Acoustical Society of America.

[17]  Zhi-Pei Liang,et al.  High-resolution dynamic speech imaging with deformation estimation , 2015, 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[18]  Li Deng,et al.  Variational inference and learning for segmental switching state space models of hidden speech dynamics , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[19]  Zhi-Pei Liang,et al.  High‐resolution dynamic speech imaging with joint low‐rank and sparsity constraints , 2015, Magnetic resonance in medicine.

[20]  Daniel Carey,et al.  Vocal Tract Images Reveal Neural Representations of Sensorimotor Transformation During Speech Imitation , 2017, Cerebral cortex.

[21]  Geoffrey J. Gordon,et al.  A Unified View of Matrix Factorization Models , 2008, ECML/PKDD.

[22]  P. W. Nye,et al.  Analysis of vocal tract shape and dimensions using magnetic resonance imaging: vowels. , 1991, The Journal of the Acoustical Society of America.

[23]  Shrikanth S. Narayanan,et al.  Convex Hull Convolutive Non-Negative Matrix Factorization for Uncovering Temporal Patterns in Multivariate Time-Series Data , 2016, INTERSPEECH.

[24]  Alan A Wrench,et al.  A MULTI-CHANNEL/MULTI-SPEAKER ARTICULATORY DATABASE FOR CONTINUOUS SPEECH RECOGNITION RESEARCH , 2000 .

[25]  R. Boubertakh,et al.  Towards clinical assessment of velopharyngeal closure using MRI: evaluation of real-time MRI sequences at 1.5 and 3 T. , 2012, The British journal of radiology.

[26]  Shrikanth S. Narayanan,et al.  Spatio-temporal articulatory movement primitives during speech production: extraction, interpretation, and validation. , 2013, The Journal of the Acoustical Society of America.

[27]  Shrikanth Narayanan,et al.  Temporal analysis of articulatory speech errors using direct image analysis of real time magnetic resonance imaging. , 2010 .

[28]  Li Deng,et al.  Target-directed mixture dynamic models for spontaneous speech recognition , 2004, IEEE Transactions on Speech and Audio Processing.

[29]  P. Birkholz Modeling Consonant-Vowel Coarticulation for Articulatory Speech Synthesis , 2013, PloS one.

[30]  H. Barnhart,et al.  The emerging science of quantitative imaging biomarkers terminology and definitions for scientific studies and regulatory submissions , 2015, Statistical methods in medical research.

[31]  P. Ladefoged,et al.  Factor analysis of tongue shapes. , 1971, Journal of the Acoustical Society of America.

[32]  Maureen Stone,et al.  A head and transducer support system for making ultrasound images of tongue/jaw movement. , 1994 .

[33]  António J. S. Teixeira,et al.  Quantitative systematic analysis of vocal tract data , 2016, Comput. Speech Lang..

[34]  R. Schweizer,et al.  On the Physiology of Normal Swallowing as Revealed by Magnetic Resonance Imaging in Real Time , 2014, Gastroenterology research and practice.

[35]  S. Maeda An articulatory model of the tongue based on a statistical analysis , 1979 .

[36]  Shrikanth S. Narayanan,et al.  On Short-Time Estimation of Vocal Tract Length from Formant Frequencies , 2015, PloS one.

[37]  Shrikanth S. Narayanan,et al.  Characterizing Post-Glossectomy Speech Using Real-time MRI , 2013 .

[38]  Shrikanth S. Narayanan,et al.  Articulation of English vowels in running speech: A real-time MRI study , 2015, ICPhS.

[39]  Shrikanth S. Narayanan,et al.  Sensitivity of Quantitative RT-MRI Metrics of Vocal Tract Dynamics to Image Reconstruction Settings , 2016, INTERSPEECH.

[40]  S. Giszter,et al.  A Neural Basis for Motor Primitives in the Spinal Cord , 2010, The Journal of Neuroscience.

[41]  Will Grathwohl,et al.  Using digital ultrasound to investigate trill vibration. , 2010 .

[42]  Shrikanth S. Narayanan,et al.  Investigating articulatory setting - pauses, ready position, and rest - using real-time MRI , 2010, INTERSPEECH.

[43]  Shrikanth S. Narayanan,et al.  A subject-independent acoustic-to-articulatory inversion , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[44]  António J. S. Teixeira,et al.  Unsupervised segmentation of the vocal tract from real-time MRI sequences , 2015, Comput. Speech Lang..

[45]  Shrikanth S. Narayanan,et al.  Directly data-derived articulatory gesture-like representations retain discriminatory information about phone categories , 2016, Comput. Speech Lang..

[46]  Michael Proctor,et al.  Articulatory bases of sonority in English liquids , 2012 .

[47]  Shrikanth Narayanan,et al.  Automatic speech recognition using articulatory features from subject-independent acoustic-to-articulatory inversion. , 2011, The Journal of the Acoustical Society of America.

[48]  Tzyy-Ping Jung,et al.  Deriving gestural score from articulator-movement records using weighted temporal decomposition , 1996, IEEE Trans. Speech Audio Process..

[49]  A I Pack,et al.  Magnetic resonance imaging of the upper airway structure of children with obstructive sleep apnea syndrome. , 2001, American journal of respiratory and critical care medicine.

[50]  Li Deng,et al.  Production models as a structural basis for automatic speech recognition , 1997, Speech Commun..

[51]  Shrikanth S. Narayanan,et al.  Exploiting speech production information for automatic speech and speaker modeling and recognition - possibilities and new opportunities , 2012, Proceedings of The 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference.

[52]  Shrikanth Narayanan,et al.  Paralinguistic mechanisms of production in human "beatboxing": a real-time magnetic resonance imaging study. , 2013, The Journal of the Acoustical Society of America.

[53]  Timothy F. Cootes,et al.  Active Appearance Models , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[54]  Didier Demolin,et al.  Coarticulation and articulatory compensations studied by dynamic MRI , 1997, EUROSPEECH.

[55]  Didier Demolin,et al.  REAL TIME MRI AND ARTICULATORY COORDINATIONS IN VOWELS , 2000 .

[56]  W. Fitch,et al.  Morphology and development of the human vocal tract: a study using magnetic resonance imaging. , 1999, The Journal of the Acoustical Society of America.

[57]  Shrikanth Narayanan,et al.  Interspeaker variability in hard palate morphology and vowel production. , 2013, Journal of speech, language, and hearing research : JSLHR.

[58]  S. Ohman Numerical model of coarticulation. , 1967, The Journal of the Acoustical Society of America.

[59]  Shrikanth S. Narayanan,et al.  Emphatic segments and emphasis spread in Lebanese Arabic: a Real-time Magnetic Resonance Imaging Study , 2012, INTERSPEECH.

[60]  Shrikanth Narayanan,et al.  Real-time magnetic resonance imaging and electromagnetic articulography database for speech production research (TC). , 2014, The Journal of the Acoustical Society of America.

[61]  Pascal Spincemaille,et al.  Anticipatory Posturing of the Vocal Tract Reveals Dissociation of Speech Movement Plans from Linguistic Units , 2016, PloS one.

[62]  LiChunming,et al.  Distance regularized level set evolution and its application to image segmentation , 2010 .

[63]  Shrikanth S. Narayanan,et al.  Enhanced airway-tissue boundary segmentation for real-time magnetic resonance imaging data , 2014 .

[64]  Shrikanth S. Narayanan,et al.  Timing effects of syllable structure and stress on nasals: A real-time MRI examination , 2009, J. Phonetics.

[65]  António J. S. Teixeira,et al.  Real-Time MRI for Portuguese - Database, Methods and Applications , 2012, PROPOR.

[66]  Robert Sader,et al.  Dynamic near‐real‐time magnetic resonance imaging for analyzing the velopharyngeal closure in comparison with videofluoroscopy , 2004, Journal of magnetic resonance imaging : JMRI.

[67]  Zhi-Pei Liang,et al.  The role of the pharynx and tongue in enhancement of vowel nasalization: a real-time MRI investigation of French nasal vowels , 2013, INTERSPEECH.

[68]  Herbert Gish,et al.  A parametric approach to vocal tract length normalization , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[69]  Andrew D Scott,et al.  Adaptive averaging applied to dynamic imaging of the soft palate , 2013, Magnetic resonance in medicine.

[70]  Shrikanth S. Narayanan,et al.  Improved imaging of lingual articulation using real‐time multislice MRI , 2012, Journal of magnetic resonance imaging : JMRI.

[71]  J D Subtelny,et al.  Cineradiographic study of sibilants. , 1972, Folia phoniatrica.

[72]  Athanasios Katsamanis,et al.  Validating rt-MRI Based Articulatory Representations via Articulatory Recognition , 2011, INTERSPEECH.

[73]  W S Levine,et al.  Modeling tongue surface contours from Cine-MRI images. , 2001, Journal of speech, language, and hearing research : JSLHR.

[74]  Steffen E. Petersen,et al.  Comparison of Cartesian and Non-Cartesian Real-Time MRI Sequences at 1.5T to Assess Velar Motion and Velopharyngeal Closure during Speech , 2016, PloS one.

[75]  Louis Goldstein,et al.  Automatic Analysis of Singleton and Geminate Consonant Articulation Using Real-Time Magnetic Resonance Imaging , 2011, INTERSPEECH.

[76]  K. Nimkin,et al.  Feasibility study to assess clinical applications of 3-T cine MRI coupled with synchronous audio recording during speech in evaluation of velopharyngeal insufficiency in children , 2015, Pediatric Radiology.

[77]  Li Lee,et al.  A frequency warping approach to speaker normalization , 1998, IEEE Trans. Speech Audio Process..

[78]  C. Drissi,et al.  Feasibility of dynamic MRI for evaluating velopharyngeal insufficiency in children , 2011, European Radiology.

[79]  Dani Byrd,et al.  Articulatory comparison of Tamil liquids and stops using real‐time magnetic resonance imaging. , 2009 .

[80]  Yang Wang,et al.  Extraction of tongue contour in real-time magnetic resonance imaging sequences , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[81]  Ravi Seethamraju,et al.  Faster dynamic imaging of speech with field inhomogeneity corrected spiral fast low angle shot (FLASH) at 3 T , 2010, Journal of magnetic resonance imaging : JMRI.

[82]  Louis Goldstein,et al.  Dynamics and articulatory phonology , 1996 .

[83]  Chris H. Q. Ding,et al.  Convex and Semi-Nonnegative Matrix Factorizations , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[84]  Raymond D. Kent,et al.  Development of vocal tract length during early childhood: a magnetic resonance imaging study. , 2005, The Journal of the Acoustical Society of America.

[85]  Hermann Ney,et al.  Speaker adaptive modeling by vocal tract normalization , 2002, IEEE Trans. Speech Audio Process..

[86]  Jayaram K. Udupa,et al.  Automatic segmentation of vocal tract MR images , 2013, 2013 IEEE 10th International Symposium on Biomedical Imaging.

[87]  Simon King,et al.  ASR - articulatory speech recognition , 2001, INTERSPEECH.

[88]  Bishnu S. Atal,et al.  Efficient coding of LPC parameters by temporal decomposition , 1983, ICASSP.

[89]  Athanasios Katsamanis,et al.  Automatic Data-Driven Learning of Articulatory Primitives from Real-Time MRI Data Using Convolutive NMF with Sparseness Constraints , 2011, INTERSPEECH.

[90]  Shrikanth S. Narayanan,et al.  Statistical methods for estimation of direct and differential kinematics of the vocal tract , 2013, Speech Commun..

[91]  Keiichi Tokuda,et al.  Mapping from articulatory movements to vocal tract spectrum with Gaussian mixture model for articulatory speech synthesis , 2004, SSW.

[92]  Kenneth N. Stevens,et al.  On the Derivation of Area Functions and Acoustic Spectra from Cinéradiographic Films of Speech , 1964 .

[93]  Shrikanth Narayanan,et al.  An approach to real-time magnetic resonance imaging for speech production. , 2003, The Journal of the Acoustical Society of America.

[94]  W J Hardcastle,et al.  The Use of Electropalatography in Phonetic Research , 1972, Phonetica.

[95]  Li Deng,et al.  A dynamic, feature-based approach to the interface between phonology and phonetics for speech modeling and recognition , 1998, Speech Commun..

[96]  Raymond D. Kent,et al.  X‐ray microbeam speech production database , 1990 .

[97]  Shrikanth S. Narayanan,et al.  Velic coordination in French nasals: a real-time magnetic resonance imaging study , 2013, INTERSPEECH.

[98]  Athanasios Katsamanis,et al.  Rapid semi-automatic segmentation of real-time magnetic resonance images for parametric vocal tract analysis , 2010, INTERSPEECH.

[99]  F. Mussa-Ivaldi Motor Primitives , Force-Fields and the Equilibrium Point Theory , .

[100]  Didier Demolin,et al.  Real-time MRI and articulatory coordination in speech. , 2002, Comptes rendus biologies.

[101]  M. Echternach,et al.  Morphometric Differences of Vocal Tract Articulators in Different Loudness Conditions in Singing , 2016, PloS one.

[102]  Katalin Mády,et al.  Consonant articulation in glossectomee speech evaluated by dynamic MRI , 2003 .

[103]  Audra E. Kosh,et al.  Linear Algebra and its Applications , 1992 .

[104]  Prasanta Kumar Ghosh,et al.  Information theoretic optimal vocal tract region selection from real time magnetic resonance images for broad phonetic class recognition , 2016, Comput. Speech Lang..

[105]  T. Flash,et al.  When practice leads to co-articulation: the evolution of geometrically defined movement primitives , 2004, Experimental Brain Research.

[106]  Reint Geuze,et al.  From Basic Motor Control to Functional Recovery , 1999 .

[107]  Dani Byrd,et al.  Analysis of pausing behavior in spontaneous speech using real-time magnetic resonance imaging of articulation. , 2009, The Journal of the Acoustical Society of America.

[108]  D J Ostry,et al.  Coarticulation of jaw movements in speech production: is context sensitivity in speech kinematics centrally planned? , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[109]  Shrikanth S. Narayanan,et al.  Region Segmentation in the Frequency Domain Applied to Upper Airway Real-Time Magnetic Resonance Images , 2009, IEEE Transactions on Medical Imaging.

[110]  Raanan Arens,et al.  A novel volumetric magnetic resonance imaging paradigm to study upper airway anatomy. , 2002, Sleep.

[111]  Athanasios Katsamanis,et al.  Statistical multi-stream modeling of real-time MRI articulatory speech data , 2010, INTERSPEECH.

[112]  Louis-Jean Boë,et al.  Tracking Contours of Orofacial Articulators from Real-Time MRI of Speech , 2016, INTERSPEECH.

[113]  Mikkel B. Stegmann,et al.  Active appearance models: Theory and cases , 2000 .

[114]  Pierre Badin,et al.  Collecting and analysing two- and three- dimensional MRI data for Swedish , 1999 .

[115]  Shrikanth S. Narayanan,et al.  An investigation of articulatory setting using real-time magnetic resonance imaging. , 2013, The Journal of the Acoustical Society of America.

[116]  M M Sondhi,et al.  The potential role of speech production models in automatic speech recognition. , 1996, The Journal of the Acoustical Society of America.

[117]  Roy Santosham,et al.  Assessment of swallowing and its disorders-a dynamic MRI study. , 2013, European journal of radiology.

[118]  Shrikanth S. Narayanan,et al.  Systematic variation in the articulation of the Korean liquid across prosodic positions , 2015, International Congress of Phonetic Sciences.

[119]  Peter Ladefoge,et al.  Direct Measurement of the Vocal Tract , 1971 .

[120]  Shrikanth S. Narayanan,et al.  Speaker verification based on the fusion of speech acoustics and inverted articulatory signals , 2016, Comput. Speech Lang..

[121]  Peter Birkholz,et al.  A Gesture-Based Concept for Speech Movement Control in Articulatory Speech Synthesis , 2007, COST 2102 Workshop.

[122]  Shrikanth Narayanan,et al.  Automatic identification of stable modes and fluctuations in a repetitive task using real-time MRI , 2007 .

[123]  Charles A Conway,et al.  Real-Time Magnetic Resonance Imaging of Velopharyngeal Activities with Simultaneous Speech Recordings , 2011, The Cleft palate-craniofacial journal : official publication of the American Cleft Palate-Craniofacial Association.

[124]  Atsushi Nakamura,et al.  Production-Oriented Models for Speech Recognition , 2006, IEICE Trans. Inf. Syst..

[125]  Eric Vatikiotis-Bateson,et al.  The Haskins optically corrected ultrasound system (HOCUS). , 2005, Journal of speech, language, and hearing research : JSLHR.

[126]  Demetri Terzopoulos,et al.  Snakes: Active contour models , 2004, International Journal of Computer Vision.

[127]  Richard A. Harshman,et al.  Factor analysis of tongue shapes. , 1971, The Journal of the Acoustical Society of America.

[128]  António J. S. Teixeira,et al.  Segmentation and Analysis of Vocal Tract from MidSagittal Real-Time MRI , 2013, ICIAR.

[129]  Bradley P. Sutton,et al.  Using magnetic resonance to image the pharynx during Arabic speech: Static and dynamic aspects , 2012, INTERSPEECH.

[130]  Zhi-Pei Liang,et al.  High‐frame‐rate full‐vocal‐tract 3D dynamic speech imaging , 2017, Magnetic resonance in medicine.

[131]  Engin Erzin,et al.  Vocal Tract Airway Tissue Boundary Tracking for rtMRI Using Shape and Appearance Priors , 2017, INTERSPEECH.

[132]  Zhen-Hua Ling,et al.  Articulatory Control of HMM-Based Parametric Speech Synthesis Using Feature-Space-Switched Multiple Regression , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[133]  Véronique Delvaux,et al.  French nasal vowels: acoustic and articulatory properties , 2002, INTERSPEECH.

[134]  Timothy F. Cootes,et al.  Active Shape Models-Their Training and Application , 1995, Comput. Vis. Image Underst..

[135]  Shrikanth S. Narayanan,et al.  Evaluation of swallow function after tongue cancer treatment using real-time magnetic resonance imaging: a pilot study. , 2013, JAMA otolaryngology-- head & neck surgery.

[136]  Shinji Maeda,et al.  Compensatory Articulation During Speech: Evidence from the Analysis and Synthesis of Vocal-Tract Shapes Using an Articulatory Model , 1990 .

[137]  A I Pack,et al.  Identification of craniofacial risk factors for obstructive sleep apnoea using three-dimensional MRI , 2011, European Respiratory Journal.

[138]  Shrikanth Narayanan,et al.  Articulation of Mandarin Sibilants: a multi-plane realtime MRI study , 2012 .

[139]  Pierre Badin,et al.  Deriving vocal-tract area functions from midsagittal profiles and formant frequencies: A new model for vowels and fricative consonants based on experimental data , 1995, Speech Commun..

[140]  Zhi-Pei Liang,et al.  A real-time MRI investigation of the role of lingual and pharyngeal articulation in the production of the nasal vowel system of French , 2015, J. Phonetics.

[141]  Shrikanth S. Narayanan,et al.  Data-driven analysis of realtime vocal tract MRI using correlated image regions , 2010, INTERSPEECH.

[142]  Sidney S. Fels,et al.  3D segmentation of the tongue in MRI: a minimally interactive model-based approach , 2015, Comput. methods Biomech. Biomed. Eng. Imaging Vis..

[143]  Peter Birkholz,et al.  Articulatory Synthesis of Speech and Singing: State of the Art and Suggestions for Future Research , 2009, COST 2102 School.

[144]  C. C. Goodyear,et al.  On the use of neural networks in articulatory speech synthesis , 1993 .

[145]  Ren-Hua Wang,et al.  Integrating Articulatory Features Into HMM-Based Parametric Speech Synthesis , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[146]  Marc E Miquel,et al.  Recommendations for real‐time speech MRI , 2016, Journal of magnetic resonance imaging : JMRI.

[147]  Jens Frahm,et al.  Real‐time MRI of speaking at a resolution of 33 ms: Undersampled radial FLASH with nonlinear inverse reconstruction , 2013, Magnetic resonance in medicine.

[148]  I. Jolliffe Principal Component Analysis , 2002 .

[149]  Shrikanth S. Narayanan,et al.  Characterizing Articulation in Apraxic Speech Using Real-Time Magnetic Resonance Imaging. , 2017, Journal of speech, language, and hearing research : JSLHR.

[150]  Hani Yehia,et al.  A parametric three-dimensional model of the vocal-tract based on MRI data , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[151]  Gérard Bailly,et al.  A three-dimensional linear articulatory model based on MRI data , 1998, ICSLP.

[152]  Caitlin Smith Complex Tongue Shaping in Lateral Liquid Production Without Constriction-Based Goals , 2014 .

[153]  Jens Frahm,et al.  Real‐time magnetic resonance imaging of normal swallowing , 2012, Journal of magnetic resonance imaging : JMRI.