A study of emotional speech articulation using a fast magnetic resonance imaging technique

A recently developed fast MR imaging system is utilized for a study of emotional speech production. Speech utterances and corresponding mid-sagittal vocal tract images are simultaneously acquired by the MRI system. Neutral, angry, sad and happy emotions are simulated by a male American English speaker. The MRI system and analysis results are described in this report. In general articulation is found to be more active in terms of the rate of vocal tract shaping and the ranges of spectral parameter values in emotional speech. It is confirmed that angry speech is characterized by wider and faster vocal tract shaping. Moreover, angry speech shows the more prominent usage of the pharyngeal region than any other emotions. It is also observed that the average vocal tract length above the false vocal folds varies as a function of emotion and that happy speech exhibit relatively shorter length than other emotions. It is likely that this is due to the elevation of the larynx and that may facilitate the higher pitch and larger pitch range manipulation to encode happy emotional quality by the speaker.

[1]  Klaus R. Scherer,et al.  Vocal communication of emotion: A review of research paradigms , 2003, Speech Commun..

[2]  Demetri Terzopoulos,et al.  Snakes: Active contour models , 2004, International Journal of Computer Vision.

[3]  Björn Granström,et al.  Measurements of articulatory variation in expressive speech for a set of Swedish vowels , 2004, Speech Commun..

[4]  Shrikanth S. Narayanan,et al.  An articulatory study of emotional speech production , 2005, INTERSPEECH.

[5]  Shrikanth Narayanan,et al.  An approach to real-time magnetic resonance imaging for speech production. , 2003, The Journal of the Acoustical Society of America.

[6]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[7]  Shrikanth Narayanan,et al.  Synchronized and noise-robust audio recordings during realtime magnetic resonance imaging scans. , 2006, The Journal of the Acoustical Society of America.

[8]  Shrikanth S. Narayanan,et al.  Toward detecting emotions in spoken dialogs , 2005, IEEE Transactions on Speech and Audio Processing.

[9]  Chandra Kambhamettu,et al.  Extraction and tracking of the tongue surface from ultrasound image sequences , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[10]  O. Fujimura,et al.  Articulatory Correlates of Prosodic Control: Emotion and Emphasis , 1998, Language and speech.