A Multimodal Real-Time MRI Articulatory Corpus of French for Speech Research

In this work we describe the creation of ArtSpeechMRIfr: a real-time as well as static magnetic resonance imaging (rtMRI, 3D MRI) database of the vocal tract. The database contains also processed data: denoised audio, its phonetically aligned annotation, articulatory contours, and vocal tract volume information , which provides a rich resource for speech research. The database is built on data from two male speakers of French. It covers a number of phonetic contexts in the controlled part, as well as spontaneous speech, 3D MRI scans of sustained vocalic articulations, and of the dental casts of the subjects. The corpus for rtMRI consists of 79 synthetic sentences constructed from a phonetized dictionary that makes possible to shorten the duration of acquisitions while keeping a very good coverage of the phonetic contexts which exist in French. The 3D MRI includes acquisitions for 12 French vowels and 10 consonants, each of which was pronounced in several vocalic contexts. Ar-ticulatory contours (tongue, jaw, epiglottis, larynx, velum, lips) as well as 3D volumes were manually drawn for a part of the images.

[1]  Shrikanth S. Narayanan,et al.  Database of Volumetric and Real-Time Vocal Tract MRI for Speech Science , 2017, INTERSPEECH.

[2]  Anastasiia Tsukanova,et al.  Centerline articulatory models of the velum and epiglottis for articulatory synthesis of speech , 2018, 2018 26th European Signal Processing Conference (EUSIPCO).

[3]  Yves Laprie,et al.  Simulating alveolar trills using a two-mass model of the tongue tip. , 2017, The Journal of the Acoustical Society of America.

[4]  Gil Francopoulo,et al.  Standards going concrete : from LMF to Morphalou , 2004, COLING 2004.

[5]  Shrikanth Narayanan,et al.  3D dynamic MRI of the vocal tract during natural speech , 2018, Magnetic resonance in medicine.

[6]  Yves Laprie,et al.  Comparison between 2D and 3D models for speech production: a study of french vowels , 2019 .

[7]  Slim Ouni,et al.  Introducing visual target cost within an acoustic-visual unit-selection speech synthesizer , 2011, AVSP.

[9]  E. Vajda Handbook of the International Phonetic Association: A Guide to the Use of the International Phonetic Alphabet , 2000 .

[10]  Jens Frahm,et al.  Real‐time MRI of speaking at a resolution of 33 ms: Undersampled radial FLASH with nonlinear inverse reconstruction , 2013, Magnetic resonance in medicine.

[11]  Emmanuel Vincent,et al.  A General Flexible Framework for the Handling of Prior Information in Audio Source Separation , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Jens Frahm,et al.  Real‐time MRI at a resolution of 20 ms , 2010, NMR in biomedicine.

[13]  Ian Maddieson,et al.  Patterns of sounds , 1986 .

[14]  Petros Maragos,et al.  Multi-View Audio-Articulatory Features for Phonetic Recognition on RTMRI-TIMIT Database , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Yves Laprie,et al.  Acoustic impacts of geometric approximation at the level of velum and epiglottis on french vowels , 2019 .

[16]  Yves Laprie,et al.  Construction and evaluation of an articulatory model of the vocal tract , 2011, 2011 19th European Signal Processing Conference.

[17]  Athanasios Katsamanis,et al.  A Multimodal Real-Time MRI Articulatory Corpus for Speech Research , 2011, INTERSPEECH.

[18]  David Miller,et al.  Shared resources for robust speech-to-text technology , 2003, INTERSPEECH.

[19]  Denis Jouvet,et al.  De l'importance de l'homogénéisation des conventions de transcription pour l'alignement automatique de corpus oraux de parole spontanée , 2015 .

[20]  Anastasiia Tsukanova,et al.  Can static vocal tract positions represent articulatory targets in continuous speech? Matching static MRI captures against real-time MRI for the French language , 2019 .