Arabic speaker emotion classification using rhythm metrics and neural networks

In this paper, rhythm metrics are calculated and used to classify five Arabic speech emotions; namely, neutral, sad, happy, surprised, and angry. Eight speakers (four male and four female) simulated the five emotions in their speech by speaking three selected sentences two times each. A human perception test was conducted using nine listeners (male and female). The results of a neural network-based automatic emotion recognition system using rhythm metrics were similar to the human perception test results, although less accurate. Anger was the most recognized speaker emotion and happiness was the least. One of our findings is that the emotions of male speakers are easier to recognize than those of female speakers. In addition, we found that the neural networks and rhythm metrics can be used for speaker emotion recognition using speech signals, but only when the dataset size is large enough.

[1]  Roddy Cowie,et al.  Automatic statistical analysis of the signal and prosodic signs of emotion in speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[2]  Sid-Ahmed Selouani,et al.  Investigation of emotion classification using speech rhythm metrics , 2013, 2013 IEEE Digital Signal Processing and Signal Processing Education Meeting (DSP/SPE).

[3]  Elmar Nöth,et al.  “You Stupid Tin Box” - Children Interacting with the AIBO Robot: A Cross-linguistic Emotional Speech Corpus , 2004, LREC.

[4]  Paul Boersma,et al.  Praat: doing phonetics by computer , 2003 .

[5]  Wasfi G. Al-Khatib,et al.  Detection of Questions in Arabic Audio Monologues Using Prosodic Features , 2007, Ninth IEEE International Symposium on Multimedia (ISM 2007).

[6]  E. Grabe,et al.  Durational variability in speech and the rhythm class hypothesis , 2005 .

[7]  S. Spitzer,et al.  Quantifying speech rhythm abnormalities in the dysarthrias. , 2009, Journal of speech, language, and hearing research : JSLHR.

[8]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[9]  Cecile Pereira DIMENSIONS OF EMOTIONAL MEANING IN SPEECH , 2000 .

[10]  Ali Hassan On automatic emotion classification using acoustic features , 2012 .

[11]  Steve Young,et al.  The HTK hidden Markov model toolkit: design and philosophy , 1993 .

[12]  O. Al-Dakkak,et al.  Prosodic Feature Introduction and Emotion Incorporation in an Arabic TTS , 2006, 2006 2nd International Conference on Information & Communication Technologies.

[13]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[14]  Sid-Ahmed Selouani,et al.  Preliminary Arabic speech emotion classification , 2014, 2014 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT).

[15]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[16]  Mike E. Davies,et al.  23rd European Signal Processing Conference (EUSIPCO 2015) , 2015 .

[17]  Ole Tange,et al.  GNU Parallel: The Command-Line Power Tool , 2011, login Usenix Mag..

[18]  Fabien Ringeval,et al.  Novel Metrics of Speech Rhythm for the Assessment of Emotion , 2012, INTERSPEECH.

[19]  Sid-Ahmed Selouani,et al.  Investigating speaker gender using rhythm metrics in Arabic dialects , 2013, 2013 8th International Workshop on Systems, Signal Processing and their Applications (WoSSPA).

[20]  M. Alpert,et al.  Reflections of depression in acoustic measures of the patient's speech. , 2001, Journal of affective disorders.

[21]  Ieee Staff 2017 25th European Signal Processing Conference (EUSIPCO) , 2017 .

[22]  F. Ramus,et al.  Correlates of linguistic rhythm in the speech signal , 1999, Cognition.

[23]  Shashidhar G. Koolagudi,et al.  Emotion recognition from speech: a review , 2012, International Journal of Speech Technology.