Upgrading the Performance of Speech Emotion Recognition at the Segmental Level

This paper presents an efficient approach for maximizing the accuracy of automatic speech emotion recognition in English, using minimal inputs, minimal features, lesser algorithmic complexity and reduced processing time. Whereas the findings reported here are based on the exclusive use of vowel formants, most of the related previous works used tens or even hundreds of other features. In spite of using a greater level of signal processing, the recognition accuracy reported earlier was often lesser than that obtained by our approach. This method is based on vowel utterances and the first step comprises statistical pre-processing of the vowel formants. This is followed by the identification of the best formants using the KMeans, K-nearest neighbor and Naive Bayes classifiers. The Artificial neural network that was used for the final classification gave an accuracy of 95.6% on elicited emotional speech. Nearly 1500 speech files from ten female speakers in the neutral and six basic emotions were used to prove the efficiency of the proposed approach. Such a result has not been reported earlier for English and is of significance to researchers, sociologists and others interested in speech.

[1]  Inma Hernáez,et al.  Feature Analysis and Evaluation for Automatic Emotion Identification in Speech , 2010, IEEE Transactions on Multimedia.

[2]  Shrikanth S. Narayanan,et al.  An articulatory study of emotional speech production , 2005, INTERSPEECH.

[3]  Ibrahiem M. M. El Emary,et al.  Speech emotion recognition approaches in human computer interaction , 2013, Telecommun. Syst..

[4]  Constantine Kotropoulos,et al.  Emotional speech recognition: Resources, features, and methods , 2006, Speech Commun..

[5]  Zhigang Deng,et al.  An acoustic study of emotions expressed in speech , 2004, INTERSPEECH.

[6]  P. Wilson,et al.  The Nature of Emotions , 2012 .

[7]  Douglas D. O'Shaughnessy,et al.  Speech communication : human and machine , 1987 .

[8]  Björn W. Schuller,et al.  Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  A. Bechara,et al.  The role of emotion in decision making: A cognitive neuroscience perspective , 2006 .

[10]  Björn W. Schuller,et al.  Acoustic emotion recognition: A benchmark comparison of performances , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[11]  Johannes Wagner,et al.  A Systematic Comparison of Different HMM Designs for Emotion Recognition from Acted and Spontaneous Speech , 2007, ACII.

[12]  R. Gunderman,et al.  Emotional intelligence. , 2011, Journal of the American College of Radiology : JACR.

[13]  Santiago-Omar Caballero-Morales Recognition of Emotions in Mexican Spanish Speech: An Approach Based on Acoustic Modelling of Emotion-Specific Vowels , 2013, TheScientificWorldJournal.

[14]  Eugen Lupu,et al.  IMPROVING SPEECH EMOTION RECOGNITION USING FREQUENCY AND TIME DOMAIN ACOUSTIC FEATURES , 2011 .

[16]  Robert I. Damper,et al.  Classification of emotional speech using 3DEC hierarchical classifier , 2012, Speech Commun..

[17]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[18]  K. Scherer,et al.  Effect of experimentally induced stress on vocal parameters. , 1986, Journal of experimental psychology. Human perception and performance.

[19]  Thomas Quatieri,et al.  Discrete-Time Speech Signal Processing: Principles and Practice , 2001 .

[20]  Mitsuru Ishizuka,et al.  EmoHeart: Conveying Emotions in Second Life Based on Affect Sensing from Text , 2010, Adv. Hum. Comput. Interact..