King Saud University Emotions Corpus: Construction, Analysis, Evaluation, and Comparison

Emotional speech recognition for the Arabic language is insufficiently tackled in the literature compared to other languages. In this paper, we present the work of creating and verifying the King Saud University Emotions (KSUEmotions) corpus, which was released by the Linguistic Data Consortium (LDC) in 2017 as the first public Arabic emotional speech corpus. KSUEmotions contains an emotional speech of twenty-three speakers from Saudi Arabia, Syria, and Yemen, and includes the emotions: neutral, happiness, sadness, surprise, and anger. The corpus content is verified in two different ways: a human perceptual test by nine listeners who rate emotional performance in audio files, and automatic emotion recognition. Two automatic emotion recognition systems are experimented with: Residual Neural Network and Convolutional Neural Network. This work also experiments with emotion recognition for the English language using the Emotional Prosody Speech and Transcripts Corpus (EPST). The current experimental work is conducted in three tracks: (i) monolingual, where independent experiments for Arabic and English are carried out, (ii) multilingual, where the Arabic and English corpora are merged in as mixed corpus, and (iii) cross-lingual, where models are trained using one language and tested using the other. A challenge encountered in this work is that the two corpora do not contain the same emotions. That problem is tackled by mapping the emotions to the arousal-valance space.

[1]  Jon Rokne,et al.  Emotion detection from text and speech: a survey , 2018, Social Network Analysis and Mining.

[2]  Shashidhar G. Koolagudi,et al.  Emotion Recognition using Speech Features , 2012, Springer Briefs in Electrical and Computer Engineering.

[3]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[4]  Sherif Abdou,et al.  Building Audio-Visual Phonetically Annotated Arabic Corpus for Expressive Text to Speech , 2017, INTERSPEECH.

[5]  Shashidhar G. Koolagudi,et al.  Emotion recognition from speech: a review , 2012, International Journal of Speech Technology.

[6]  Hussein Hussein,et al.  Natural Arabic Language Resources for Emotion Recognition in Algerian Dialect , 2019, ICALP.

[7]  Lamiaa Abdel-Hamid,et al.  Analysis of Linguistic and Prosodic Features of Bilingual Arabic–English Speakers for Speech Emotion Recognition , 2020, IEEE Access.

[8]  Hauke R. Heekeren,et al.  Conceptualizing Emotions Along the Dimensions of Valence, Arousal, and Communicative Frequency – Implications for Social-Cognitive Tests and Training Tools , 2011, Front. Psychology.

[9]  Cha Zhang,et al.  CROWDMOS: An approach for crowdsourcing mean opinion score studies , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Lamiaa Abdel-Hamid,et al.  Egyptian Arabic speech emotion recognition using prosodic, spectral and wavelet features , 2020, Speech Commun..

[11]  Carlo Drioli,et al.  Modifications of phonetic labial targets in emotive speech: effects of the co-production of speech and emotions , 2004, Speech Commun..

[12]  Sid-Ahmed Selouani,et al.  Cross-corpus Arabic and English emotion recognition , 2017, 2017 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT).

[13]  Constantine Kotropoulos,et al.  Emotional speech recognition: Resources, features, and methods , 2006, Speech Commun..

[14]  Thanaruk Theeramunkong,et al.  Developing a Thai emotional speech corpus from Lakorn (EMOLA) , 2018, Lang. Resour. Evaluation.

[15]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Aurobinda Routray,et al.  Databases, features and classifiers for speech emotion recognition: a review , 2018, International Journal of Speech Technology.

[17]  Mumtaz Begum Mustafa,et al.  Speech emotion recognition research: an analysis of research focus , 2018, International Journal of Speech Technology.

[18]  Adel M. Alimi,et al.  Building and analysing emotion corpus of the Arabic speech , 2017, 2017 1st International Workshop on Arabic Script Analysis and Recognition (ASAR).

[19]  Theodoros Iliou,et al.  Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011 , 2012, Artificial Intelligence Review.

[20]  Jon Sánchez,et al.  Designing and Recording an Audiovisual Database of Emotional Speech in Basque , 2004, LREC.

[21]  Thamer Alhussain,et al.  Speech Emotion Recognition Using Deep Learning Techniques: A Review , 2019, IEEE Access.

[22]  Sid-Ahmed Selouani,et al.  Speech Emotion Recognition using Convolutional Recurrent Neural Networks and Spectrograms , 2020, 2020 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE).

[23]  Ziad Osman,et al.  Emotion recognition in Arabic speech , 2017, 2017 Sensors Networks Smart and Emerging Technologies (SENSET).

[24]  Kaya Oguz,et al.  Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers , 2020, Speech Commun..

[25]  Paul Boersma,et al.  Praat: doing phonetics by computer , 2003 .

[26]  Yan Liu,et al.  Deep residual learning for image steganalysis , 2018, Multimedia Tools and Applications.

[27]  Daniel Elenius,et al.  The PF_STAR children's speech corpus , 2005, INTERSPEECH.

[28]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[29]  Sid-Ahmed Selouani,et al.  Evaluation of an Arabic Speech Corpus of Emotions: A Perceptual and Statistical Analysis , 2018, IEEE Access.