Analysis of Linguistic and Prosodic Features of Bilingual Arabic–English Speakers for Speech Emotion Recognition

Speech emotion recognition (SER) research has usually focused on the analysis of the native language of speakers, most commonly, targeting European and Asian languages. In the present study, a bilingual Arabic/English speech emotion database elicited from 16 male and 16 female Egyptian participants was created in order to investigate how the linguistic and prosodic features were affected by the anger, fear, happiness and sadness emotions across Arabic and English emotional speech. The results of the linguistic analysis indicated that the participants preferred to express their emotions indirectly, mainly using religious references, and that the female participants tended to use language that was more tentative and emotionally expressive, while the male participants tended to use language that was more assertive and independent. As for the prosodic analysis, statistical t-tests showed that the prosodic features of pitch, intensity and speech rate were more indicative of anger and happiness while less relevant to fear and scarcely significant for sadness. Furthermore, speech emotion recognition performed using linear support vector machine (SVM) with AdaBoost also supported these results. In regard to first and second language linguistic features, there was no significant difference in the choice of words and structures expressing the different emotions in the two languages, but in terms of prosodic features, the females’ speech showed higher pitch in Arabic in all cases while both genders showed close intensity values in the two languages and faster speech rate in Arabic than in English.

[1]  Björn W. Schuller,et al.  Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge , 2011, Speech Commun..

[2]  Paul Boersma,et al.  Praat: doing phonetics by computer , 2003 .

[3]  Björn Schuller,et al.  The Automatic Recognition of Emotions in Speech , 2011 .

[4]  Adel M. Alimi,et al.  Building and analysing emotion corpus of the Arabic speech , 2017, 2017 1st International Workshop on Arabic Script Analysis and Recognition (ASAR).

[5]  M. Yüksel,et al.  Long term average speech spectra of Turkish , 2018, Logopedics, phoniatrics, vocology.

[6]  Sonja A. Kotz,et al.  Recognizing Emotions in a Foreign Language , 2009 .

[7]  N. Gopika,et al.  Correlation Based Feature Selection Algorithm for Machine Learning , 2018, 2018 3rd International Conference on Communication and Electronics Systems (ICCES).

[8]  G. Lakoff Women, fire, and dangerous things : what categories reveal about the mind , 1989 .

[9]  Catherine Miller,et al.  Is religious affiliation a key factor of language variation in Arabic-speaking countries? , 2015 .

[10]  Mumtaz Begum Mustafa,et al.  Speech emotion recognition research: an analysis of research focus , 2018, International Journal of Speech Technology.

[11]  Xiaoqian Jiang,et al.  Supplementary Issue: Computational Advances in Cancer Informatics (a) , 2022 .

[12]  P. Keating,et al.  Comparison of speaking fundamental frequency in English and Mandarin. , 2010, The Journal of the Acoustical Society of America.

[13]  Anna Esposito,et al.  Cross-Modal Analysis of Speech, Gestures, Gaze and Facial Expressions , 2009 .

[14]  Mohamad Izani Zainal Abidin,et al.  Emotion pitch variation analysis in Malay and English voice samples , 2003, 9th Asia-Pacific Conference on Communications (IEEE Cat. No.03EX732).

[15]  Swati Johar,et al.  Paralinguistic profiling using speech recognition , 2014, International Journal of Speech Technology.

[16]  Elmar Nöth,et al.  The Recognition of Emotion , 2000 .

[17]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Theodoros Iliou,et al.  Towards Emotion Recognition from Speech: Definition, Problems and the Materials of Research , 2010, Semantics in Adaptive and Personalized Services.

[19]  Constantine Kotropoulos,et al.  Emotional speech recognition: Resources, features, and methods , 2006, Speech Commun..

[20]  Wojciech Majewski,et al.  Polish Emotional Speech Database - Recording and Preliminary Validation , 2009, COST 2102 Conference.

[21]  Ziad Osman,et al.  Emotion recognition in Arabic speech , 2017, 2017 Sensors Networks Smart and Emerging Technologies (SENSET).

[22]  P. Boersma ACCURATE SHORT-TERM ANALYSIS OF THE FUNDAMENTAL FREQUENCY AND THE HARMONICS-TO-NOISE RATIO OF A SAMPLED SOUND , 1993 .

[23]  Aneta Pavlenko,et al.  Bilingual minds : emotional experience, expression and representation , 2006 .

[24]  L. F. Barrett Was Darwin Wrong About Emotional Expressions? , 2011 .

[25]  Rong Wang,et al.  Feature Learning Viewpoint of Adaboost and a New Algorithm , 2019, IEEE Access.

[26]  Anna Wierzbicka,et al.  Emotions Across Languages and Cultures: Diversity and Universals: Defining emotion concepts: discovering “cognitive scenarios” , 1999 .

[27]  Chun Chen,et al.  An Enhanced Speech Emotion Recognition System Based on Discourse Information , 2006, International Conference on Computational Science.

[28]  I. Daum,et al.  The contribution of speech rate and pitch variation to the perception of vocal emotions in a German and an American sample , 2001 .

[29]  Gholamreza Anbarjafari,et al.  Efficiency of chosen speech descriptors in relation to emotion recognition , 2017, EURASIP Journal on Audio, Speech, and Music Processing.

[30]  Sid-Ahmed Selouani,et al.  Evaluation of an Arabic Speech Corpus of Emotions: A Perceptual and Statistical Analysis , 2018, IEEE Access.

[31]  Ning An,et al.  Speech Emotion Recognition Using Fourier Parameters , 2015, IEEE Transactions on Affective Computing.

[32]  Ashraf Khalil,et al.  Anger Detection in Arabic Speech Dialogs , 2018, 2018 International Conference on Computing Sciences and Engineering (ICCSE).

[33]  Sid-Ahmed Selouani,et al.  Preliminary Arabic speech emotion classification , 2014, 2014 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT).

[34]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[35]  Pello Huizi Petrikorena Bilingual minds. Emotional experience, expression and representation (Aneta Pavlenko) , 2006 .

[36]  Calbert Graham,et al.  Fundamental Frequency Range in Japanese and English: The Case of Simultaneous Bilinguals , 2015, Phonetica.

[37]  Mahmoud Al-Ayyoub,et al.  Recognizing Emotion from Speech Based on Age and Gender Using Hierarchical Models , 2019, ANT/EDI40.

[38]  Heiga Zen,et al.  Constructing emotional speech synthesizers with limited speech database , 2004, INTERSPEECH.

[39]  Aneta Pavlenko,et al.  Emotion and emotion-laden words in the bilingual lexicon , 2008 .

[40]  Diana Van Lancker Sidtis,et al.  The bilingual voice: Vocal characteristics when speaking two languages across speech tasks , 2017 .

[41]  Alan Wolfe,et al.  An Intellectual In Public , 2003 .

[42]  H. Palo,et al.  Wavelet based feature combination for recognition of emotions , 2017, Ain Shams Engineering Journal.

[43]  Shrikanth S. Narayanan,et al.  An articulatory study of emotional speech production , 2005, INTERSPEECH.

[44]  Sanghamitra Mohanty,et al.  Emotion Recognition using Fuzzy K-Means from Oriya Speech , 2011 .

[45]  A. Pavlenko Emotions and Multilingualism , 2005 .

[46]  Elmar Nöth,et al.  Automatic modelling of depressed speech: relevant features and relevance of gender , 2014, INTERSPEECH.

[47]  Antonio Barrientos Cruz,et al.  Emotion recognition in non-structured utterances for human-robot interaction , 2005, ROMAN 2005. IEEE International Workshop on Robot and Human Interactive Communication, 2005..

[48]  Adel M. Alimi,et al.  Content-Based Arabic Speech Similarity Search and Emotion Detection , 2016, AISI.

[49]  Carolyn Penstein Rosé,et al.  Identification of confusion and surprise in spoken dialog using prosodic features , 2006, INTERSPEECH.

[50]  P. Eckert Linguistic variation as social practice , 2000 .

[51]  Jiahong Yuan,et al.  The acoustic realization of anger, fear, joy and sadness in Chinese , 2002, INTERSPEECH.

[52]  M. Bond,et al.  Embarrassment and code switching into a second language , 1986 .

[53]  Theodoros Iliou,et al.  Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011 , 2012, Artificial Intelligence Review.

[54]  Andrew Rosenberg Modeling intensity contours and the interaction of pitch and intensity to improve automatic prosodic event detection and classification , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[55]  Aurobinda Routray,et al.  Databases, features and classifiers for speech emotion recognition: a review , 2018, International Journal of Speech Technology.

[56]  Janice R. Kelly,et al.  Gender-Emotion Stereotypes Are Context Specific , 1999 .

[57]  M. Inés Torres,et al.  Emotion Detection from Speech and Text , 2018, IberSPEECH.

[58]  Joshua A. Fishman,et al.  2. A decalogue of basic theoretical perspectives for a sociology of language and religion , 2006 .

[59]  Yustinus Eko Soelistio,et al.  Towards Indonesian speech-emotion automatic recognition (I-SpEAR) , 2017, 2017 4th International Conference on New Media Studies (CONMEDIA).

[60]  Gunnar Rätsch,et al.  An Improvement of AdaBoost to Avoid Overfitting , 1998, ICONIP.

[61]  Julia Fischer,et al.  Effect of Acting Experience on Emotion Expression and Recognition in Voice: Non-Actors Provide Better Stimuli than Expected , 2015, Journal of nonverbal behavior.

[62]  Klaus R. Scherer,et al.  Vocal communication of emotion: A review of research paradigms , 2003, Speech Commun..

[63]  J. Pennebaker,et al.  PERSONALITY PROCESSES AND INDIVIDUAL DIFFERENCES Words of Wisdom: Language Use Over the Life Span , 2003 .

[64]  H WittenIan,et al.  The WEKA data mining software , 2009 .