Analysis of feature extraction techniques for improved emotion recognition in presence of additive noise

Recently, studies have been performed on identification and classification of feature extraction for emotion recognition. Recognition rate of Speech Emotion Recognition system (SER) degrades when there exist a noisy environment. This paper suggests a new approach of feature extraction for robust emotion recognition in noisy environment. It demonstrates the use of cochlear filterbank with zero-crossing for frequency estimation and Multiclass Support Vector Machine for classification. Experimental results shows that proposed cochlear feature with Zero Crossing (ZC) gives better accuracy in identifying emotional state in voiced signal compared to baseline approach MFCC. When cochlear filterbank coefficients is combined with prosodic feature (i.e. energy and pitch), recognition rate was found to be improved for the same database by 9.92, 2.5, 3.52 (%) for various noise levels in testing dataset.

[1]  Purnima Chandrasekar,et al.  Automatic Speech Emotion Recognition: A survey , 2014, 2014 International Conference on Circuits, Systems, Communication and Information Technology Applications (CSCITA).

[2]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[4]  Rhee Man Kil,et al.  Auditory processing of speech signals for robust speech recognition in real-world noisy environments , 1999, IEEE Trans. Speech Audio Process..

[5]  Nikos Fakotakis,et al.  An Adaptive Framework for Acoustic Monitoring of Potential Hazards , 2009, EURASIP J. Audio Speech Music. Process..

[6]  Ram Mohana Reddy Guddeti,et al.  Multiclass SVM-based language-independent emotion recognition using selective speech features , 2014, 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[7]  Shashidhar G. Koolagudi,et al.  Speech Emotion Recognition Using Segmental Level Prosodic Analysis , 2011, 2011 International Conference on Devices and Communications (ICDeCom).

[8]  Chok-Ki Chan,et al.  Speech recognition based on zero crossing rate and energy , 1985, IEEE Trans. Acoust. Speech Signal Process..

[9]  Shashidhar G. Koolagudi,et al.  Emotion recognition from speech using global and local prosodic features , 2013, Int. J. Speech Technol..

[10]  Qi Li,et al.  Robust speaker identification using an auditory-based feature , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  James M. Kates,et al.  A time-domain digital cochlear model , 1991, IEEE Trans. Signal Process..

[12]  Sartra Wongthanavasu,et al.  Speech emotion recognition using Support Vector Machines , 2013, 2013 5th International Conference on Knowledge and Smart Technology (KST).

[13]  Hema A. Murthy,et al.  Feature diversity for emotion, language and speaker verification , 2011, 2011 National Conference on Communications (NCC).

[14]  Fuji Ren,et al.  Speech emotion recognition using combination of features , 2013, 2013 Fourth International Conference on Intelligent Control and Information Processing (ICICIP).

[15]  J. Montero,et al.  ANALYSIS AND MODELLING OF EMOTIONAL SPEECH IN SPANISH , 1999 .

[16]  Decrement Thld Speech Recognition Based on Zero Crossing Rate and Energy , 1985 .

[17]  S. Karimi,et al.  Best features for emotional speech classification in the presence of babble noise , 2012, 20th Iranian Conference on Electrical Engineering (ICEE2012).

[18]  Ning An,et al.  Speech Emotion Recognition Using Fourier Parameters , 2015, IEEE Transactions on Affective Computing.

[19]  Shashidhar G. Koolagudi,et al.  Robust Emotion Recognition using Spectral and Prosodic Features , 2013, Springer Briefs in Electrical and Computer Engineering.

[20]  S. K. Jagtap,et al.  DSP based improved Speech Recognition system , 2012, 2012 International Conference on Communication, Information & Computing Technology (ICCICT).

[21]  S. Saraswathi,et al.  Efficient speech emotion recognition using binary support vector machines & multiclass SVM , 2015, 2015 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC).

[22]  B. Venkataramani,et al.  Hardware Implementation of Real-Time Speech Recognition System Using TMS320C6713 DSP , 2011, 2011 24th Internatioal Conference on VLSI Design.