Influences of age in emotion recognition of spontaneous speech: A case of an under-resourced language

Recognizing emotions using natural or spontaneous speech are extremely difficult compared to doing the same for acted or elicited speeches. Speech emotion recognition for real conversation such as spontaneous speech requires linguistic information of the speech to be included in the speech emotion recognition component to achieve a high recognition rate. However, with the lack of digital speech resources of an under-resourced language, this requirement poses a problem. In this paper, speech emotion recognition of spontaneous speech in Malay language using prosodic features and Random Forest classifier is presented. We also investigate the influence of age categorized as children, young adults and middle-aged on emotion recognition. Ninety spontaneous speech sentences from 30 native speakers of Malay language are collected and classified into three emotions, which are happy, angry and sad. Results show that the spontaneous speech of middle-aged group achieved the highest accuracy rate followed by children age group and finally the young adults. While sad emotions are recognized satisfactorily across all age groups, confusions exist between happy and angry emotions.

[1]  N. Petry,et al.  A comparison of young, middle-aged, and older adult treatment-seeking pathological gamblers. , 2002, The Gerontologist.

[2]  Sung Wook Baik,et al.  Divide-and-Conquer based Ensemble to Spot Emotions in Speech using MFCC and Random Forest , 2016, ArXiv.

[3]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2009, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Shashidhar G. Koolagudi,et al.  Emotion recognition from speech: a review , 2012, International Journal of Speech Technology.

[5]  Nikos Fakotakis,et al.  Modeling the Temporal Evolution of Acoustic Parameters for Speech Emotion Recognition , 2012, IEEE Transactions on Affective Computing.

[6]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[7]  Rajesvary Rajoo,et al.  Influences of languages in speech emotion recognition: A comparative study using Malay, English and Mandarin languages , 2016, 2016 IEEE Symposium on Computer Applications & Industrial Electronics (ISCAIE).

[8]  Hille Pajupuu,et al.  Estonian Emotional Speech Corpus: Culture and Age in Selecting Corpus Testers , 2010, Baltic HLT.

[9]  Yi-Ping Phoebe Chen,et al.  Acoustic feature selection for automatic emotion recognition from speech , 2009, Inf. Process. Manag..

[10]  M. Ross,et al.  Average magnitude difference function pitch extractor , 1974 .

[11]  Sonja A. Kotz,et al.  On the Time Course of Vocal Emotion Recognition , 2011, PloS one.

[12]  Tim Polzehl,et al.  Detecting real life anger , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Roziati Zainuddin,et al.  Assessing the naturalness of malay emotional voice corpora , 2011, 2011 International Conference on Speech Database and Assessments (Oriental COCOSDA).

[14]  Elisabeth André,et al.  Improving Automatic Emotion Recognition from Speech via Gender Differentiaion , 2006, LREC.

[15]  Wendy Lee Ages and stages , 2011 .

[16]  Zhong-Qiu Wang,et al.  Learning utterance-level representations for speech emotion and age/gender recognition using deep neural networks , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Sonja A. Kotz,et al.  How aging affects the recognition of emotional speech , 2008, Brain and Language.

[18]  K. Miller,et al.  Ages and Stages , 1985 .

[19]  Shrikanth S. Narayanan,et al.  Toward detecting emotions in spoken dialogs , 2005, IEEE Transactions on Speech and Audio Processing.

[20]  Tanja Schultz,et al.  Automatic speech recognition for under-resourced languages: A survey , 2014, Speech Commun..

[21]  Ying Sun,et al.  Progress in speech emotion recognition , 2015, TENCON 2015 - 2015 IEEE Region 10 Conference.

[22]  Mann Oo. Hay Emotion recognition in human-computer interaction , 2012 .

[23]  Ryoichi Komiya,et al.  Comparison between fuzzy and NN method for speech emotion recognition , 2005, Third International Conference on Information Technology and Applications (ICITA'05).

[24]  Iain R. Murray,et al.  Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. , 1993, The Journal of the Acoustical Society of America.

[25]  Chung-Hsien Wu,et al.  Survey on audiovisual emotion recognition: databases, features, and data fusion strategies , 2014, APSIPA Transactions on Signal and Information Processing.

[26]  Antonio Criminisi,et al.  Decision Forests: A Unified Framework for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning , 2012, Found. Trends Comput. Graph. Vis..

[27]  Wendi B. Heinzelman,et al.  Speech-based emotion classification using multiclass SVM with hybrid kernel and thresholding fusion , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[28]  Syed Abbas Ali,et al.  Development and Analysis of Speech Emotion Corpus Using Prosodic Features for Cross Linguistics , 2013 .

[29]  Nilesh R. Patel,et al.  Implementation and Comparison of Speech Emotion Recognition System Using Gaussian Mixture Model (GMM) and K- Nearest Neighbor (K-NN) Techniques , 2015 .

[30]  Björn W. Schuller,et al.  Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge , 2011, Speech Commun..

[31]  Shashidhar G. Koolagudi,et al.  Speech Emotion Recognition Using Segmental Level Prosodic Analysis , 2011, 2011 International Conference on Devices and Communications (ICDeCom).