MRI Vocal Tract Sagittal Slices Estimation During Speech Production of CV

In this paper we propose an algorithm for estimating vocal tract para sagittal slices in order to have a better overview of the behaviour of the articulators during speech production. The first step is to align the consonant-vowel (CV) data of the sagittal plains between them for the train speaker. Sets of transformations that connect the midsagittal frames with the neighbouring ones is acquired for the train speaker. Another set of transformations is calculated which transforms the midsagittal frames of the train speaker to the corresponding midsagittal frames of the test speaker and is used to adapt to the test speaker domain the previously computed sets of transformations. The newly adapted transformations are applied to the midsagittal frames of the test speaker in order to estimate the neighbouring sagittal frames. Several mono speaker models are combined to produce the final frame estimation. To evaluate the results, image cross-correlation between the original and the estimated frames was used. Results show good agreement between the original and the estimated frames.

[1]  Shrikanth S. Narayanan,et al.  Analysis of speech production real-time MRI , 2018, Comput. Speech Lang..

[2]  Anastasiia Tsukanova,et al.  Towards a Method of Dynamic Vocal Tract Shapes Generation by Combining Static 3D and Dynamic 2D MRI Speech Data , 2019, INTERSPEECH.

[3]  Tom Vercauteren,et al.  Diffeomorphic demons: Efficient non-parametric image registration , 2009, NeuroImage.

[4]  Shrikanth S. Narayanan,et al.  Estimation of vocal tract area function from volumetric Magnetic Resonance Imaging , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Y. Yamashita,et al.  Improvement of speech intelligibility by a secondary operation to mobilize the tongue after glossectomy. , 1989, Journal of cranio-maxillo-facial surgery : official publication of the European Association for Cranio-Maxillo-Facial Surgery.

[6]  Jens Frahm,et al.  Real‐time MRI at a resolution of 20 ms , 2010, NMR in biomedicine.

[7]  Shrikanth Narayanan,et al.  3D dynamic MRI of the vocal tract during natural speech , 2018, Magnetic resonance in medicine.

[8]  Shrikanth Narayanan,et al.  An approach to real-time magnetic resonance imaging for speech production. , 2003, The Journal of the Acoustical Society of America.

[9]  Jean-Philippe Thirion,et al.  Image matching as a diffusion process: an analogy with Maxwell's demons , 1998, Medical Image Anal..

[10]  Jens Frahm,et al.  Real‐time MRI of speaking at a resolution of 33 ms: Undersampled radial FLASH with nonlinear inverse reconstruction , 2013, Magnetic resonance in medicine.

[11]  Shrikanth S. Narayanan,et al.  Advances in real-time magnetic resonance imaging of the vocal tract for speech science and technology research , 2016, APSIPA Transactions on Signal and Information Processing.

[12]  Anastasiia Tsukanova,et al.  Centerline articulatory models of the velum and epiglottis for articulatory synthesis of speech , 2018, 2018 26th European Signal Processing Conference (EUSIPCO).

[13]  Shrikanth Narayanan,et al.  Characterizing Post-Glossectomy Speech Using Real-time MRI , 2013 .

[14]  Peter Birkholz,et al.  A three-dimensional model of the vocal tract for speech synthesis , 2003 .