A Study of Language and Classifier-independent Feature Analysis for Vocal Emotion Recognition

Every speech signal carries implicit information about the emotions, which can be extracted by speech processing methods. In this paper, we propose an algorithm for extracting features that are independent from the spoken language and the classification method to have comparatively good recognition performance on different languages independent from the employed classification methods. The proposed algorithm is composed of three stages. In the first stage, we propose a feature ranking method analyzing the state-of-the-art voice quality features. In the second stage, we propose a method for finding the subset of the common features for each language and classifier. In the third stage, we compare our approach with the recognition rate of the state-of-the-art filter methods. We use three databases with different languages, namely, Polish, Serbian and English. Also three different classifiers, namely, nearest neighbour, support vector machine and gradient descent neural network, are employed. It is shown that our method for selecting the most significant language-independent and method-independent features in many cases outperforms state-of-the-art filter methods.

[1]  Sergio Escalera,et al.  Automatic Recognition of Facial Displays of Unfelt Emotions , 2017, IEEE Transactions on Affective Computing.

[2]  Linda S. Siegel,et al.  The Association Between Morphological Awareness and Literacy in English Language Learners From Different Language Backgrounds , 2013 .

[3]  Yifan Gong,et al.  An Overview of Noise-Robust Automatic Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[4]  Kim Hartmann,et al.  Investigation of Speaker Group-Dependent Modelling for Recognition of Affective States from Speech , 2014, Cognitive Computation.

[5]  Hajime Kobayashi,et al.  Weighted autocorrelation for pitch extraction of noisy speech , 2001, IEEE Trans. Speech Audio Process..

[6]  Anupam Agrawal,et al.  Vision based hand gesture recognition for human computer interaction: a survey , 2012, Artificial Intelligence Review.

[7]  Wolfgang Hess,et al.  Pitch Determination of Speech Signals: Algorithms and Devices , 1983 .

[8]  Vlado Delic,et al.  Discrimination Capability of Prosodic and Spectral Features for Emotional Speech Recognition , 2012 .

[9]  François Pachet,et al.  ON THE USE OF ZERO-CROSSING RATE FOR AN APPLICATION OF CLASSIFICATION OF PERCUSSIVE SOUNDS , 2000 .

[10]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[11]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[12]  Daniel Neiberg,et al.  Evidence for cultural dialects in vocal emotion expression: acoustic classification within and across five nations. , 2014, Emotion.

[13]  Bouziane Ayoub,et al.  An analysis and comparative evaluation of MFCC variants for speaker identification over VoIP networks , 2015, 2015 World Congress on Information Technology and Computer Applications (WCITCA).

[14]  Björn W. Schuller,et al.  Hidden Markov model-based speech emotion recognition , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[15]  Robert Prandolini,et al.  Formant trajectories as indices of phonetic variation for speaker identification , 2013 .

[16]  P. Heaton,et al.  Developmental change and cross-domain links in vocal and musical emotion recognition performance in childhood. , 2015, The British journal of developmental psychology.

[17]  John G. Harris,et al.  Improving the filter bank of a classic speech feature extraction algorithm , 2003, Proceedings of the 2003 International Symposium on Circuits and Systems, 2003. ISCAS '03..

[18]  Pranav Gokhale,et al.  Applications of Convolutional Neural Networks , 2016 .

[19]  Björn W. Schuller,et al.  Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge , 2011, Speech Commun..

[20]  Shashidhar G. Koolagudi,et al.  Robust Emotion Recognition using Pitch Synchronous and Sub-syllabic Spectral Features , 2013 .

[21]  Björn W. Schuller,et al.  Recent developments in openSMILE, the munich open-source multimedia feature extractor , 2013, ACM Multimedia.

[22]  Steven J. Simske,et al.  Recognition of emotions in interactive voice response systems , 2003, INTERSPEECH.

[23]  Rosalind W. Picard,et al.  Recognizing affect from speech prosody using hierarchical graphical models , 2011, Speech Commun..

[24]  S. B. Patil,et al.  Zero crossing rate and Energy of the Speech Signal of Devanagari Script , 2014 .

[25]  Johannes Wagner,et al.  From Physiological Signals to Emotions: Implementing and Comparing Selected Methods for Feature Extraction and Classification , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[26]  Sergio Escalera,et al.  Survey on Emotional Body Gesture Recognition , 2018, IEEE Transactions on Affective Computing.

[27]  Theodoros Iliou,et al.  Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011 , 2012, Artificial Intelligence Review.

[28]  Wei-Chi Tsai,et al.  Determinants and consequences of employee displayed positive emotions , 2001 .

[29]  Martin Wolf,et al.  Channel selection measures for multi-microphone speech recognition , 2014, Speech Commun..

[30]  Shao-Hsuan Lee,et al.  Audio–vocal responses of vocal fundamental frequency and formant during sustained vowel vocalizations in different noises , 2015, Hearing Research.

[31]  Gholamreza Anbarjafari,et al.  Data Fusion Boosted Face Recognition Based on Probability Distribution Functions in Different Colour Channels , 2009, EURASIP J. Adv. Signal Process..

[32]  Climent Nadeu,et al.  Time and frequency filtering of filter-bank energies for robust HMM speech recognition , 2000, Speech Commun..

[33]  Jiucang Hao,et al.  Emotion recognition by speech signals , 2003, INTERSPEECH.

[34]  Nader Jafarnia Dabanloo,et al.  Automatic classification of speech dysfluencies in continuous speech based on similarity measures and morphological image processing tools , 2016, Biomed. Signal Process. Control..

[35]  Weihui Dai,et al.  Emotion recognition and affective computing on vocal social media , 2015, Inf. Manag..

[36]  Björn W. Schuller,et al.  Paralinguistics in speech and language - State-of-the-art and the challenge , 2013, Comput. Speech Lang..

[37]  Jonathan Harrington,et al.  Phonetic Analysis of Speech Corpora , 2010 .

[38]  Vlado Delic,et al.  Improvement of Thai speech emotion recognition by using face feature analysis , 2011, 2011 International Symposium on Intelligent Signal Processing and Communications Systems (ISPACS).

[39]  Ivor W. Tsang,et al.  Learning Sparse SVM for Feature Selection on Very High Dimensional Datasets , 2010, ICML.

[40]  David Poeppel,et al.  Multiple levels of linguistic and paralinguistic features contribute to voice recognition , 2015, Scientific Reports.

[41]  Nenad Tomašev,et al.  Hubness-based fuzzy measures for high-dimensional k-nearest neighbor classification , 2014 .

[42]  Ning An,et al.  Speech Emotion Recognition Using Fourier Parameters , 2015, IEEE Transactions on Affective Computing.

[43]  Meelis Mihkla,et al.  EXPRESSION OF BASIC EMOTIONS IN ESTONIAN PARAMETRIC TEXT-TO-SPEECH SYNTHESIS , 2015 .

[44]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[45]  Faran Awais Butt,et al.  Short-time energy, magnitude, zero crossing rate and autocorrelation measurement for discriminating voiced and unvoiced segments of speech signals , 2013, 2013 The International Conference on Technological Advances in Electrical, Electronics and Computer Engineering (TAEECE).

[46]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[47]  Antonio Origlia,et al.  Continuous emotion recognition with phonetic syllables , 2014, Speech Commun..

[48]  Jesús B. Alonso,et al.  New approach in quantification of emotional intensity from the speech signal: emotional temperature , 2015, Expert Syst. Appl..

[49]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[50]  John E. Markel,et al.  Linear Prediction of Speech , 1976, Communication and Cybernetics.

[51]  Björn W. Schuller,et al.  Evolutionary Feature Generation in Speech Emotion Recognition , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[52]  Kristy Elizabeth Boyer,et al.  The relationship between task difficulty and emotion in online computer programming tutoring (abstract only) , 2014, SIGCSE '14.

[53]  Lijiang Chen,et al.  Speech emotion recognition: Features and classification models , 2012, Digit. Signal Process..

[54]  Dong Yu,et al.  Deep Learning: Methods and Applications , 2014, Found. Trends Signal Process..

[55]  Gholamreza Anbarjafari,et al.  Vocal-based emotion recognition using random forests and decision tree , 2017, International Journal of Speech Technology.

[56]  Samuel Kim,et al.  Detecting pathological speech using contour modeling of harmonic-to-noise ratio , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[57]  Yixiong Pan,et al.  SPEECH EMOTION RECOGNITION USING SUPPORT VECTOR MACHINE , 2010 .

[58]  Gunnar Fant,et al.  Acoustic Theory Of Speech Production , 1960 .

[59]  Theodoros Kostoulas,et al.  Enhancing Emotion Recognition from Speech through Feature Selection , 2010, TSD.

[60]  Antanas Verikas,et al.  Automated speech analysis applied to laryngeal disease categorization , 2008, Comput. Methods Programs Biomed..

[61]  Gholamreza Anbarjafari,et al.  Efficiency of chosen speech descriptors in relation to emotion recognition , 2017, EURASIP Journal on Audio, Speech, and Music Processing.

[62]  Ke Chen,et al.  Exploring Language-Independent Emotional Acoustic Features via Feature Selection , 2010, ArXiv.

[63]  Sergio Escalera,et al.  Integrating Vision and Language for First-Impression Personality Analysis , 2018, IEEE MultiMedia.

[64]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[65]  O. Golan,et al.  Psychoacoustic abilities as predictors of vocal emotion recognition , 2013, Attention, perception & psychophysics.

[66]  Pierre Maurage,et al.  The development of emotion recognition from facial expressions and non-linguistic vocalizations during childhood. , 2015, The British journal of developmental psychology.

[67]  Christos-Nikolaos Anagnostopoulos,et al.  Sound Processing Features for Speaker-Dependent and Phrase-Independent Emotion Recognition in Berlin Database , 2008, ISD.

[68]  Elisabeth André,et al.  EmoVoice - A Framework for Online Recognition of Emotions from Voice , 2008, PIT.

[69]  Gholamreza Anbarjafari,et al.  Automated Screening of Job Candidate Based on Multimodal Video Processing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[70]  Sergio Escalera,et al.  Changes in Facial Expression as Biometric: A Database and Benchmarks of Identification , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[71]  S. Gökhun Tanyer,et al.  Voice activity detection in nonstationary noise , 2000, IEEE Trans. Speech Audio Process..

[72]  T. Landauer,et al.  Handbook of Human-Computer Interaction , 1997 .

[73]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[74]  Tsang-Long Pao,et al.  A Study on the Search of the Most Discriminative Speech Features in the Speaker Dependent Speech Emotion Recognition , 2012, 2012 Fifth International Symposium on Parallel Architectures, Algorithms and Programming.

[75]  Vijay K. Madisetti,et al.  Digital Signal Processing Fundamentals , 2009 .

[76]  Giovanni Costantini,et al.  Speech emotion recognition using amplitude modulation parameters and a combined feature selection procedure , 2014, Knowl. Based Syst..

[77]  Philip de Chazal,et al.  Telephony-based voice pathology assessment using automated speech analysis , 2006, IEEE Transactions on Biomedical Engineering.

[78]  Serbian emotional speech database : design , processing and evaluation , 2004 .