An Approach for Emotion Recognition using Purely Segment-Level Acoustic Features

A purely segment-level approach is proposed in this paper that entirely abandons the utterance-level features. We focus on better extracting the emotional information from a number of selected segments within utterances. We designed two segment selection approaches (miSATIR and crSATIR) for selecting utterance segments for use in extracting features that are based on information theory and correlation coefficients to create the purely segment-level concept of the model. We established a model using these selected segment-level speech frames after clarifying the time interval for the segments. Testing has been carried out on a 50-person emotional speech database that was specifically designed for this research, and we found that there were significant improvements in the average level of accuracy (more than 20%) compared to that using the existing approaches for all the utterances' information. The test results that were based on the speech signals stimulated by the International Affective Picture System (IAPS) database showed that the proposed method could be used in emotion strength analyses.

[1]  Yongzhao Zhan,et al.  A novel hierarchical speech emotion recognition method based on improved DDAGSVM , 2010, Comput. Sci. Inf. Syst..

[2]  Carsten O. Daub,et al.  The mutual information: Detecting and evaluating dependencies between variables , 2002, ECCB.

[3]  Stacy Marsella,et al.  Evaluating a Computational Model of Emotion , 2005, Autonomous Agents and Multi-Agent Systems.

[4]  Amitava Chatterjee,et al.  Support vector machines employing cross-correlation for emotional speech recognition , 2009 .

[5]  A. Grossmann,et al.  DECOMPOSITION OF HARDY FUNCTIONS INTO SQUARE INTEGRABLE WAVELETS OF CONSTANT SHAPE , 1984 .

[6]  K. Pearson VII. Note on regression and inheritance in the case of two parents , 1895, Proceedings of the Royal Society of London.

[7]  Björn W. Schuller,et al.  Timing levels in segment-based speech emotion recognition , 2006, INTERSPEECH.

[8]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[9]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[10]  Claude E. Shannon,et al.  The mathematical theory of communication , 1950 .

[11]  P. Lang International Affective Picture System (IAPS) : Technical Manual and Affective Ratings , 1995 .

[12]  Rosalind W. Picard Affective Computing , 1997 .

[13]  Yoon Keun Kwak,et al.  Improved Emotion Recognition With a Novel Speaker-Independent Feature , 2009, IEEE/ASME Transactions on Mechatronics.

[14]  Constantine Kotropoulos,et al.  Emotional speech recognition: Resources, features, and methods , 2006, Speech Commun..

[15]  Donald F. Specht,et al.  Probabilistic neural networks , 1990, Neural Networks.

[16]  Tsang-Long Pao,et al.  Segment-based emotion recognition from continuous Mandarin Chinese speech , 2011, Comput. Hum. Behav..

[17]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[18]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[19]  Dan Xu,et al.  Decision Templates Ensemble and Diversity Analysis for Segment-Based Speech Emotion Recognition , 2007 .

[20]  L. H. Anauer,et al.  Speech Analysis and Synthesis by Linear Prediction of the Speech Wave , 2000 .

[21]  Ruili Wang,et al.  Ensemble methods for spoken emotion recognition in call-centres , 2007, Speech Commun..