Analysis of Features and Classifiers in Emotion Recognition Systems: Case Study of Slavic Languages

Today’s human-computer interaction systems have a broad variety of applications in which automatic human emotion recognition is of great interest. Literature contains many different, more or less successful forms of these systems. This work emerged as an attempt to clarify which speech features are the most informative, which classification structure is the most convenient for this type of tasks, and the degree to which the results are influenced by database size, quality and cultural characteristic of a language. The research is presented as the case study on Slavic languages.

[1]  S. Jovi Serbian emotional speech database : design , processing and evaluation , 2004 .

[2]  Branimir Dropuljić,et al.  Emotional speech corpus of Croatian language , 2011, 2011 7th International Symposium on Image and Signal Processing and Analysis (ISPA).

[3]  Anna Esposito,et al.  Cross-Modal Analysis of Speech, Gestures, Gaze and Facial Expressions , 2009 .

[4]  Hassan Farsi,et al.  Implementation and optimization of a speech recognition system based on hidden Markov model using genetic algorithm , 2014, 2014 Iranian Conference on Intelligent Systems (ICIS).

[5]  Hyongsuk Kim,et al.  Application of Poincare-Mapping of Voiced-Speech Segments for Emotion Sensing , 2009, Sensors.

[6]  Wojciech Majewski,et al.  Polish Emotional Speech Database - Recording and Preliminary Validation , 2009, COST 2102 Conference.

[7]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[8]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[9]  Piotr Staroniewicz Automatic Recognition of Emotional State in Polish Speech , 2010, COST 2102 Training School.

[10]  Albino Nogueiras,et al.  Speech emotion recognition using hidden Markov models , 2001, INTERSPEECH.

[11]  Vlado Delic,et al.  Discrimination Capability of Prosodic and Spectral Features for Emotional Speech Recognition , 2012 .

[12]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[13]  Björn W. Schuller,et al.  The INTERSPEECH 2009 emotion challenge , 2009, INTERSPEECH.

[14]  Branimir Dropuljić,et al.  Analyzing Affective States using Acoustic and Linguistic Features , 2016 .

[15]  Vlado Delic,et al.  RELEVANCE OF THE TYPES AND THE STATISTICAL PROPERTIES OF FEATURES IN THE RECOGNITION OF BASIC EMOTIONS IN SPEECH , 2014 .

[16]  Sonja A. Kotz,et al.  Recognizing Emotions in a Foreign Language , 2009 .

[17]  Valery A. Petrushin,et al.  RUSLANA: a database of Russian emotional utterances , 2002, INTERSPEECH.

[18]  Igor Stankovic,et al.  Temporal Discrete Cosine Transform for speech emotion recognition , 2013, 2013 IEEE 4th International Conference on Cognitive Infocommunications (CogInfoCom).

[19]  Björn W. Schuller,et al.  Acoustic emotion recognition: A benchmark comparison of performances , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[20]  Anna Esposito,et al.  Toward Autonomous, Adaptive, and Context-Aware Multimodal Interfaces. Theoretical and Practical Issues - Third COST 2102 International Training School, Caserta, Italy, March 15-19, 2010, Revised Selected Papers , 2011, COST 2102 Training School.

[21]  Robert I. Damper,et al.  Multi-class and hierarchical SVMs for emotion recognition , 2010, INTERSPEECH.

[22]  Björn Schuller,et al.  openSMILE:): the Munich open-source large-scale multimedia feature extractor , 2015, ACMMR.

[23]  D. Uhrin,et al.  Design and implementation of Czech database of speech emotions , 2014, 2014 22nd Telecommunications Forum Telfor (TELFOR).

[24]  Martin A. Riedmiller,et al.  Deep auto-encoder neural networks in reinforcement learning , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[25]  Vincent Baeten,et al.  Combination of support vector machines (SVM) and near‐infrared (NIR) imaging spectroscopy for the detection of meat and bone meal (MBM) in compound feeds , 2004 .

[26]  Branimir Dropuljić,et al.  Croatian Emotional Speech Analyses on a Basis of Acoustic and Linguistic Features , 2016 .

[27]  Ragini Verma,et al.  Class-level spectral features for emotion recognition , 2010, Speech Commun..

[28]  Gang Wei,et al.  Speech emotion recognition based on HMM and SVM , 2005, 2005 International Conference on Machine Learning and Cybernetics.

[29]  France Mihelic,et al.  Development and Evaluation of the Emotional Slovenian Speech Database - EmoLUKS , 2015, TSD.

[30]  Sonja A. Kotz,et al.  Factors in the recognition of vocally expressed emotions: A comparison of four languages , 2009, J. Phonetics.

[31]  C. Vinola,et al.  A Survey on Human Emotion Recognition Approaches, Databases and Applications , 2015 .

[32]  Ke Chen,et al.  Emotional state recognition from speech via soft-competition on different acoustic representations , 2011, The 2011 International Joint Conference on Neural Networks.

[33]  Magdalena Igras,et al.  Database of emotional speech recordings , 2013 .

[34]  Tomasz Sapiński,et al.  Comparison of perceptual features efficiency for automatic identification of emotional states from speech signal , 2013 .

[35]  Enrique Marcelo Albornoz,et al.  Spoken Emotion Recognition Using Deep Learning , 2014, CIARP.

[36]  Wioleta Szwoch,et al.  Emotion Recognition and Its Applications , 2014 .

[37]  Ke Chen,et al.  Towards automatic emotional state categorization from speech signals , 2008, INTERSPEECH.

[38]  Björn W. Schuller,et al.  Deep neural networks for acoustic emotion recognition: Raising the benchmarks , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[39]  Gholamreza Anbarjafari,et al.  Efficiency of chosen speech descriptors in relation to emotion recognition , 2017, EURASIP Journal on Audio, Speech, and Music Processing.

[40]  Hania Farag,et al.  Emotion Recognition Using Neural Network: A Comparative Study , 2013 .

[41]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[42]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .