Recognition of vocal emotions from acoustic profile

The paper concerns about the discrete basic five emotions (anger, sadness, happiness, neutral, and fear) recognition, which are nowadays required for intelligent Human Computer Interaction. The emotion recognition proposed in our system is for the variable duration utterances which are language, cultural and subject independent. A comprehensive study of the various discriminant acoustic vocal features that assists in emotion recognition is performed. To capture the prolonged features' values among adjacent and subsequent frames, a new method at the pre-processing stage is suggested. This method aggregates the feature vectors in three adjacent and sequential combinations of frames obtained from digitized speech signals' segmentation. A three layered SVM classifier having RBF kernel is used that utilizes the dominant prosodic discriminant features at different layers. The experiments are conducted on both the standard Berlin Database of Emotional Speech (EMO-DB) and self recorded portrayed audio by Indians in the Hindi and English Language. Results obtained reveal that the system performed well under this architecture with an average accuracy for all emotions of approximately 85%.

[1]  Wenle Zhang,et al.  Lexical Tone Recognition with an Artificial Neural Network , 2008, Ear and hearing.

[2]  Björn W. Schuller,et al.  String-based audiovisual fusion of behavioural events for the assessment of dimensional affect , 2011, Face and Gesture 2011.

[3]  Chih-Jen Lin,et al.  Asymptotic Behaviors of Support Vector Machines with Gaussian Kernel , 2003, Neural Computation.

[4]  Oh-Wook Kwon,et al.  EMOTION RECOGNITION BY SPEECH SIGNAL , 2003 .

[5]  Manish Gaurav,et al.  Performance analysis of spectral and prosodic features and their fusion for emotion recognition in speech , 2008, 2008 IEEE Spoken Language Technology Workshop.

[6]  K. Truong How does real affect affect affect recognition in speech , 2009 .

[7]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[8]  Constantine Kotropoulos,et al.  Automatic speech classification to five emotional states based on gender information , 2004, 2004 12th European Signal Processing Conference.

[9]  Ioannis Pitas,et al.  Automatic emotional speech classification , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Kornel Laskowski,et al.  Combining Efforts for Improving Automatic Classification of Emotional User States , 2006 .

[11]  Changchun Liu,et al.  An empirical study of machine learning techniques for affect recognition in human–robot interaction , 2006, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12]  Shrikanth S. Narayanan,et al.  Exploiting Acoustic and Syntactic Features for Prosody Labeling in a Maximum Entropy Framework , 2007, HLT-NAACL.

[13]  Björn W. Schuller,et al.  Low-Level Fusion of Audio, Video Feature for Multi-Modal Emotion Recognition , 2008, VISAPP.

[14]  Roddy Cowie,et al.  Automatic recognition of emotion from voice: a rough benchmark , 2000 .

[15]  Mingteh Chen,et al.  Applications of Support Vector Machines on Smart Phone Systems for Emotional Speech Recognition , 2010 .

[16]  Adil Alpkocak,et al.  Emotion Classification of Audio Signals Using Ensemble of Support Vector Machines , 2008, PIT.

[17]  Matthias Rychetsky,et al.  Algorithms and architectures for machine learning based on regularized neural netwoks and support vector approaches , 2001 .

[18]  Sonja A. Kotz,et al.  On the Time Course of Vocal Emotion Recognition , 2011, PloS one.

[19]  Tim Polzehl,et al.  Emotion classification in children's speech using fusion of acoustic and linguistic features , 2009, INTERSPEECH.

[20]  Malcolm Slaney,et al.  Baby Ears: a recognition system for affective vocalizations , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[21]  Rosalind W. Picard,et al.  Classical and novel discriminant features for affect recognition from speech , 2005, INTERSPEECH.

[22]  K. Scherer,et al.  Acoustic profiles in vocal emotion expression. , 1996, Journal of personality and social psychology.

[23]  Ashish Kapoor,et al.  Multimodal affect recognition in learning environments , 2005, ACM Multimedia.

[24]  Lijiang Chen,et al.  Speaker independent emotion recognition based on SVM/HMMS fusion system , 2008, 2008 International Conference on Audio, Language and Image Processing.

[25]  Peter Robinson,et al.  Multimodal Affect Recognition in Intelligent Tutoring Systems , 2011, ACII.