A Multimodal Real-Time MRI Articulatory Corpus for Speech Research

We present MRI-TIMIT: a large-scale database of synchronized audio and real-time magnetic resonance imaging (rtMRI) data for speech research. The database currently consists of speech data acquired from two male and two female speakers of American English. Subjects’ upper airways were imaged in the midsagittal plane while reading the same 460 sentence corpus used in the MOCHA-TIMIT corpus [1]. Accompanying acoustic recordings were phonemically transcribed using forced alignment. Vocal tract tissue boundaries were automatically identified in each video frame, allowing for dynamic quantification of each speaker’s midsagittal articulation. The database and companion toolset provide a unique resource with which to examine articulatory-acoustic relationships in speech production. Index Terms: speech production, speech corpora, real-time MRI, multi-modal database, large-scale phonetic tools

[1]  Athanasios Katsamanis,et al.  Statistical multi-stream modeling of real-time MRI articulatory speech data , 2010, INTERSPEECH.

[2]  Simon King,et al.  Speech production knowledge in automatic speech recognition. , 2007, The Journal of the Acoustical Society of America.

[3]  Shrikanth Narayanan,et al.  An exploratory study of emotional speech production using functional data analysis techniques , 2006 .

[4]  Shrikanth S. Narayanan,et al.  Region Segmentation in the Frequency Domain Applied to Upper Airway Real-Time Magnetic Resonance Images , 2009, IEEE Transactions on Medical Imaging.

[5]  B. Atal,et al.  Inversion of articulatory-to-acoustic transformation in the vocal tract by a computer-sorting technique. , 1978, The Journal of the Acoustical Society of America.

[6]  Shrikanth Narayanan,et al.  Synchronized and noise-robust audio recordings during realtime magnetic resonance imaging scans. , 2006, The Journal of the Acoustical Society of America.

[7]  Shrikanth Narayanan,et al.  Temporal analysis of articulatory speech errors using direct image analysis of real time magnetic resonance imaging. , 2010 .

[8]  Carol Y. Espy-Wilson ARTICULATORY STRATEGIES, SPEECH ACOUSTICS AND VARIABILITY , 2004 .

[9]  A. Liberman,et al.  On the relation of speech to language , 2000, Trends in Cognitive Sciences.

[10]  Raymond D. Kent,et al.  X‐ray microbeam speech production database , 1990 .

[11]  M M Sondhi,et al.  The potential role of speech production models in automatic speech recognition. , 1996, The Journal of the Acoustical Society of America.

[12]  Shrikanth Narayanan,et al.  Morphological Variation in the Adult Vocal Tract : A Study Using rtMRI , 2010 .

[13]  Yoon-Chul Kim,et al.  Seeing speech: Capturing vocal tract shaping using real-time magnetic resonance imaging [Exploratory DSP] , 2008, IEEE Signal Processing Magazine.

[14]  Panayiotis G. Georgiou,et al.  SailAlign: Robust long speech-text alignment , 2011 .

[15]  Tzyy-Ping Jung,et al.  Deriving gestural score from articulator-movement records using weighted temporal decomposition , 1996, IEEE Trans. Speech Audio Process..

[16]  Athanasios Katsamanis,et al.  Direct Estimation of Articulatory Kinematics from Real-Time Magnetic Resonance Image Sequences , 2011, INTERSPEECH.

[17]  Elliot Saltzman,et al.  Articulatory Information for Noise Robust Speech Recognition , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Shrikanth S. Narayanan,et al.  Data-driven analysis of realtime vocal tract MRI using correlated image regions , 2010, INTERSPEECH.

[19]  Shrikanth Narayanan,et al.  An approach to real-time magnetic resonance imaging for speech production. , 2003, The Journal of the Acoustical Society of America.

[20]  Shrikanth Narayanan,et al.  A generalized smoothness criterion for acoustic-to-articulatory inversion. , 2010, The Journal of the Acoustical Society of America.

[21]  Athanasios Katsamanis,et al.  Validating rt-MRI Based Articulatory Representations via Articulatory Recognition , 2011, INTERSPEECH.

[22]  Louis Goldstein,et al.  Automatic Analysis of Singleton and Geminate Consonant Articulation Using Real-Time Magnetic Resonance Imaging , 2011, INTERSPEECH.

[23]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .