A study of emotional information present in articulatory movements estimated using acoustic-to-articulatory inversion

This study examines emotion-specific information (ESI) in the articulatory movements estimated using acoustic-to-articulatory inversion on emotional speech. We study two main aspects: (1) the degree of similarity between the pair of estimated and original articulatory trajectories for the same and different emotions and (2) the amount of ESI present in the estimated trajectory. They are evaluated using mean squared error between the articulatory pair and by automated emotion classification. This study uses parallel acoustic and articulatory data in 5 elicited emotions spoken by 3 native American English speakers. We also test emotion classification performance using articulatory trajectories estimated from different acoustic feature sets and they turn out subject-dependent. Experimental results suggest that the ESI in the estimated trajectory, although smaller than that in the direct articulatory measurements, is found to be complementary to that in the prosodic features and hence, suggesting the usefulness of estimated articulatory data for emotions research.

[1]  Donna Erickson,et al.  Some articulatory measurements of real sadness , 2004, INTERSPEECH.

[2]  Shrikanth S. Narayanan,et al.  An Exploratory Study of the Relations Between Perceived Emotion Strength and Articulatory Kinematics , 2011, INTERSPEECH.

[3]  Keiichi Tokuda,et al.  Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model , 2008, Speech Commun..

[4]  Shrikanth S. Narayanan,et al.  A study of emotional speech articulation using a fast magnetic resonance imaging technique , 2006, INTERSPEECH.

[5]  Shrikanth S. Narayanan,et al.  An articulatory study of emotional speech production , 2005, INTERSPEECH.

[6]  Shrikanth Narayanan,et al.  Investigation of the inter‐articulator correlation in acoustic‐to‐articulatory inversion using generalized smoothness criterion. , 2010 .

[7]  Hani Camille Yehia,et al.  A study on the speech acoustic-to-articulatory mapping using morphological constraints , 1997 .

[8]  Shrikanth Narayanan,et al.  An approach to real-time magnetic resonance imaging for speech production. , 2003, The Journal of the Acoustical Society of America.

[9]  Shrikanth Narayanan,et al.  Automatic speech recognition using articulatory features from subject-independent acoustic-to-articulatory inversion. , 2011, The Journal of the Acoustical Society of America.

[10]  Shrikanth S. Narayanan,et al.  A study of interplay between articulatory movement and prosodic characteristics in emotional speech production , 2010, INTERSPEECH.

[11]  Mark Liberman,et al.  Speaker identification on the SCOTUS corpus , 2008 .