Emotion Recognition from Speech using Prosodic and Linguistic Features

Speech signal can be used to extract emotions. However, it is pertinent to note that variability in speech signal can make emotion extraction a challenging task. There are a number of factors that indicate presence of emotions. Prosodic and temporal features have been used previously for the purpose of identifying emotions. Separately, prosodic/temporal and linguistic features of speech do not provide results with adequate accuracy. We can also find out emotions from linguistic features if we can identify contents. Therefore, We consider prosodic as well as temporal or linguistic features which help increasing accuracy of emotion recognition, which is our first contribution reported in this paper. We propose a two-step model for emotion recognition; we extract emotions based on prosodic features in the first step. We extract emotions from word segmentation combined with linguistic features in the second step. While performing our experiments, we prove that the classification mechanisms, if trained without considering age factor, do not help improving accuracy. We argue that the classifier should be based on the age group on which the actual emotion extraction be required, and this becomes our second contribution submitted in this paper.

[1]  Klaus R. Scherer,et al.  The role of intonation in emotional expressions , 2005, Speech Commun..

[2]  Yongzhao Zhan,et al.  Adaptive and Optimal Classification of Speech Emotion Recognition , 2008, 2008 Fourth International Conference on Natural Computation.

[3]  Harry Shum,et al.  Emotion Detection from Speech to Enrich Multimedia Content , 2001, IEEE Pacific Rim Conference on Multimedia.

[4]  Shashidhar G. Koolagudi,et al.  Emotion Recognition using Speech Features , 2012, Springer Briefs in Electrical and Computer Engineering.

[5]  A. Tanju Erdem,et al.  Improving automatic emotion recognition from speech signals , 2009, INTERSPEECH.

[6]  João Paulo Papa,et al.  Spoken emotion recognition through optimum-path forest classification using glottal features , 2010, Comput. Speech Lang..

[7]  Jon Sánchez,et al.  Automatic emotion recognition using prosodic parameters , 2005, INTERSPEECH.

[8]  Shiqing Zhang,et al.  Emotion Recognition in Chinese Natural Speech by Combining Prosody and Voice Quality Features , 2008, ISNN.

[9]  Valery A. Petrushin,et al.  EMOTION IN SPEECH: RECOGNITION AND APPLICATION TO CALL CENTERS , 1999 .

[10]  L. Rabiner,et al.  An algorithm for determining the endpoints of isolated utterances , 1974, The Bell System Technical Journal.

[11]  Bin Yang,et al.  The Relevance of Voice Quality Features in Speaker Independent Emotion Recognition , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[12]  Ragini Verma,et al.  Class-level spectral features for emotion recognition , 2010, Speech Commun..

[13]  Mohammed Yeasin,et al.  Robust Recognition of Emotion from Speech , 2006, IVA.

[14]  Frank Dellaert,et al.  Recognizing emotion in speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[15]  C. K. Yuen,et al.  Theory and Application of Digital Signal Processing , 1978, IEEE Transactions on Systems, Man, and Cybernetics.

[16]  Lin-Shan Lee,et al.  Feature analysis for emotion recognition from Mandarin speech considering the special characteristics of Chinese language , 2006, INTERSPEECH.

[17]  Jeih-Weih Hung,et al.  Robust entropy-based endpoint detection for speech recognition in noisy environments , 1998, ICSLP.

[18]  Theodoros Iliou,et al.  Statistical Evaluation of Speech Features for Emotion Recognition , 2009, 2009 Fourth International Conference on Digital Telecommunications.

[19]  Shashidhar G. Koolagudi,et al.  Emotion recognition from speech: a review , 2012, International Journal of Speech Technology.

[20]  Rhee Man Kil,et al.  Auditory processing of speech signals for robust speech recognition in real-world noisy environments , 1999, IEEE Trans. Speech Audio Process..

[21]  E. Lopez-Poveda,et al.  A human nonlinear cochlear filterbank. , 2001, The Journal of the Acoustical Society of America.

[22]  Tiago H. Falk,et al.  Automatic speech emotion recognition using modulation spectral features , 2011, Speech Commun..

[23]  Qi Luo,et al.  Study on Speech Emotion Recognition System in E-Learning , 2007, HCI.

[24]  Constantine Kotropoulos,et al.  Emotional speech recognition: Resources, features, and methods , 2006, Speech Commun..

[25]  Zhigang Deng,et al.  An acoustic study of emotions expressed in speech , 2004, INTERSPEECH.