Emotional Speech Recognition Based on the Committee of Classifiers

This article presents the novel method for emotion recognition from speech based on committee of classifiers. Different classification methods were juxtaposed in order to compare several alternative approaches for final voting. The research is conducted on three different types of Polish emotional speech: acted out with the same content, acted out with different content, and spontaneous. A pool of descriptors, commonly utilized for emotional speech recognition, expanded with sets of various perceptual coefficients, is used as input features. This research shows that presented approach improve the performance with respect to a single classifier.

[1]  Gholamreza Anbarjafari,et al.  Vocal-based emotion recognition using random forests and decision tree , 2017, International Journal of Speech Technology.

[2]  D. O'Shaughnessy,et al.  Linear predictive coding , 1988, IEEE Potentials.

[3]  Sergio Escalera,et al.  Action Recognition Using Single-Pixel Time-of-Flight Detection , 2019, Entropy.

[4]  Zhongzhe Xiao,et al.  Automatic Hierarchical Classification of Emotional Speech , 2007, Ninth IEEE International Symposium on Multimedia Workshops (ISMW 2007).

[5]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[6]  Enes Yuncu,et al.  Automatic Speech Emotion Recognition Using Auditory Models with Binary Decision Tree and SVM , 2014, 2014 22nd International Conference on Pattern Recognition.

[7]  Michelle Karg,et al.  Body Movements for Affective Expression: A Survey of Automatic Recognition and Generation , 2013, IEEE Transactions on Affective Computing.

[8]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[9]  Andrea Kleinsmith,et al.  Affective Body Expression Perception and Recognition: A Survey , 2013, IEEE Transactions on Affective Computing.

[10]  Ruili Wang,et al.  Ensemble methods for spoken emotion recognition in call-centres , 2007, Speech Commun..

[11]  Emmanuel Dellandréa,et al.  Recognition of emotions in speech by a hierarchical approach , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[12]  C. Darwin,et al.  The Expression of the Emotions in Man and Animals , 1956 .

[13]  Krzysztof Slot,et al.  Emotion Recognition with Poincare Mapping of Voiced-Speech Segments of Utterances , 2006, ICAISC.

[14]  Gholamreza Anbarjafari,et al.  Efficiency of chosen speech descriptors in relation to emotion recognition , 2017, EURASIP Journal on Audio, Speech, and Music Processing.

[15]  Peter Robinson,et al.  Mind reading machines: automated inference of cognitive mental states from video , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[16]  Yasushi Makihara,et al.  Object recognition supported by user interaction for service robots , 2002, Object recognition supported by user interaction for service robots.

[17]  Subhasmita Sahoo,et al.  Study of feature combination using HMM and SVM for multilingual Odiya speech emotion recognition , 2015, International Journal of Speech Technology.

[18]  P. Wilson,et al.  The Nature of Emotions , 2012 .

[19]  Aurobinda Routray,et al.  Databases, features and classifiers for speech emotion recognition: a review , 2018, International Journal of Speech Technology.

[20]  P. Ekman,et al.  Facial action coding system: a technique for the measurement of facial movement , 1978 .

[21]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[22]  Kasiprasad Mannepalli,et al.  Analysis of Emotion Recognition System for Telugu Using Prosodic and Formant Features , 2018 .

[23]  Noureddine Ellouze,et al.  Evaluation of influence of arousal-valence primitives on speech emotion recognition , 2018, Int. Arab J. Inf. Technol..

[24]  Wenzhen Zhang,et al.  Speech Emotion Recognition Based on SVM and ANN , 2018, International Journal of Machine Learning and Computing.

[25]  Basilio Sierra,et al.  Feature Selection for Speech Emotion Recognition in Spanish and Basque: On the Use of Machine Learning to Improve Human-Computer Interaction , 2014, PloS one.

[26]  P. Ekman,et al.  Constants across cultures in the face and emotion. , 1971, Journal of personality and social psychology.

[27]  Sandeep Rathor,et al.  Acoustic domain classification and recognition through ensemble based multilevel classification , 2019, J. Ambient Intell. Humaniz. Comput..

[28]  Sergio Escalera,et al.  Survey on Emotional Body Gesture Recognition , 2018, IEEE Transactions on Affective Computing.

[29]  Gholamreza Anbarjafari,et al.  Supervised Vocal-Based Emotion Recognition Using Multiclass Support Vector Machine, Random Forests, and Adaboost , 2017 .

[30]  R. Plutchik Human emotions have deep evolutionary roots, a fact that may explain their complexity and provide tools for clinical practice , 2016 .

[31]  Ziad Osman,et al.  Ensemble Models for Enhancement of an Arabic Speech Emotion Recognition System , 2019 .

[32]  Idoia Cearreta,et al.  ASSISTIVE TECHNOLOGY AND AFFECTIVE MEDIATION , 2006 .

[33]  Fakhri Karray,et al.  Dimensionality Reduction for Emotional Speech Recognition , 2012, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing.

[34]  Sergio Escalera,et al.  Automatic Recognition of Deceptive Facial Expressions of Emotion , 2017, ArXiv.

[35]  Gholamreza Anbarjafari,et al.  Automatic speech based emotion recognition using paralinguistics features , 2019 .

[36]  Chun Chen,et al.  A Hierarchical Framework for Speech Emotion Recognition , 2006, 2006 IEEE International Symposium on Industrial Electronics.

[37]  John G. Harris,et al.  Increased mfcc filter bandwidth for noise-robust phoneme recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[38]  Seyedmahdad Mirsamadi,et al.  Automatic speech emotion recognition using recurrent neural networks with local attention , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[39]  Sajib Hasan,et al.  Emotion Detection from Speech Signals using Voting Mechanism on Classified Frames , 2019, 2019 International Conference on Robotics,Electrical and Signal Processing Techniques (ICREST).

[40]  P. Mermelstein Determination of the vocal-tract shape from measured formant frequencies. , 1967, The Journal of the Acoustical Society of America.

[41]  Tsang-Long Pao,et al.  Combination of Multiple Classifiers for Improving Emotion Recognition in Mandarin Speech , 2007, Third International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP 2007).

[42]  Fu Wang,et al.  Decision tree SVM model with Fisher feature selection for speech emotion recognition , 2019, EURASIP J. Audio Speech Music. Process..

[44]  Chung-Hsien Wu,et al.  Hierarchical modeling of temporal course in emotional expression for speech emotion recognition , 2015, 2015 International Conference on Affective Computing and Intelligent Interaction (ACII).

[45]  Tomasz Sapiński,et al.  Polish emotional speech recognition based on the committee of classifiers , 2017 .

[46]  C. Darwin The Expression of the Emotions in Man and Animals , .

[47]  Björn W. Schuller,et al.  Towards distributed recognition of emotion from speech , 2012, 2012 5th International Symposium on Communications, Control and Signal Processing.

[48]  B. Gelder,et al.  Why bodies? Twelve reasons for including bodily expressions in affective neuroscience , 2009, Philosophical Transactions of the Royal Society B: Biological Sciences.

[49]  Daniel McDuff,et al.  Affectiva-MIT Facial Expression Dataset (AM-FED): Naturalistic and Spontaneous Facial Expressions Collected "In-the-Wild" , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[50]  Pawan Kumar,et al.  Spoken Language Identification Using Hybrid Feature Extraction Methods , 2010, ArXiv.

[51]  Jun-Wei Mao,et al.  Speech emotion recognition based on feature selection and extreme learning machine decision tree , 2018, Neurocomputing.

[52]  Hasan Demirel,et al.  3D CNN-Based Speech Emotion Recognition Using K-Means Clustering and Spectrograms , 2019, Entropy.

[53]  Maie Bachmann,et al.  Audiovisual emotion recognition in wild , 2018, Machine Vision and Applications.

[54]  Chung-Hsien Wu,et al.  Speech emotion recognition with ensemble learning methods , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[55]  Jianfeng Zhao,et al.  Speech emotion recognition using deep 1D & 2D CNN LSTM networks , 2019, Biomed. Signal Process. Control..

[56]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[57]  Shashidhar G. Koolagudi,et al.  Emotion recognition from speech using global and local prosodic features , 2013, Int. J. Speech Technol..

[58]  Mahmoud Al-Ayyoub,et al.  Recognizing Emotion from Speech Based on Age and Gender Using Hierarchical Models , 2019, ANT/EDI40.

[59]  Gholamreza Anbarjafari,et al.  Multimodal Database of Emotional Speech, Video and Gestures , 2018, CVAUI/IWCF/MIPPSNA@ICPR.

[60]  Mubarak Shah,et al.  Person-on-person violence detection in video data , 2002, Object recognition supported by user interaction for service robots.

[61]  Jianfeng Zhao,et al.  Learning deep features to recognise speech emotion using merged deep CNN , 2018, IET Signal Process..

[62]  Minetada Osano,et al.  A Multi-agent Based Interactive System Towards Child’s Emotion Performances Quantified Through Affective Body Gestures , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[63]  Theodoros Iliou,et al.  Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011 , 2012, Artificial Intelligence Review.

[64]  Dong Yu,et al.  Speech emotion recognition using deep neural network and extreme learning machine , 2014, INTERSPEECH.