Voice Based Emotion Recognition with Convolutional Neural Networks for Companion Robots

In order to obtain emotional-related response from robots, computers and other intelligent machines, the first and decisive step is accurate emotion recognition. This paper presents the implementation of this function with the deep learning model of Convolutional Neural Networks (CNN). The architecture wis an adaptation of an image processing CNN, programmed in Python using Keras model-level library and TensorFlow backend. The theoretical background that lays the foundation of the classification of emotions based on voice parameters is briefly presented. According to the obtained results, the model achieves the mean accuracy of 71.33% for six emotions (happiness, fear, sadness, disgust, anger, surprise), which is comparable with performances reported in scientific literature. The original contributions of the paper are: the adaptation of the deep learning model for processing the audio files, the training of the CNN with a set of recordings in Romanian language and an experimental software environment for generating test files. Key-words: Voice Recognition, Emotion Recognition Convolutional Neural Networks, Companion Robots, pet robots.

[1]  Shrikanth Narayanan,et al.  Recognition of negative emotions from the speech signal , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[2]  Steven J. Simske,et al.  Recognition of emotions in interactive voice response systems , 2003, INTERSPEECH.

[3]  Theodoros Iliou,et al.  Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011 , 2012, Artificial Intelligence Review.

[4]  Orest Oltu,et al.  New approach on power efficiency of a RISC processor , 2008 .

[5]  Mann Oo. Hay Emotion recognition in human-computer interaction , 2012 .

[6]  Alex Pentland,et al.  Automatic spoken affect classification and analysis , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[7]  Yafeng Niu,et al.  A breakthrough in Speech emotion recognition using Deep Retinal Convolution Neural Networks , 2017, ArXiv.

[8]  Mikko Sams,et al.  Emotion and the auditory brainstem response to speech , 2010, Neuroscience Letters.

[9]  Koteswara Rao Anne,et al.  Acoustic Modeling for Emotion Recognition , 2015 .

[10]  Jennifer Healey,et al.  Detecting stress during real-world driving tasks using physiological sensors , 2005, IEEE Transactions on Intelligent Transportation Systems.

[11]  Frank Dellaert,et al.  Recognizing emotion in speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[12]  Kornel Laskowski,et al.  Emotion recognition in spontaneous speech using GMMs , 2006, INTERSPEECH.

[13]  Iain R. Murray,et al.  Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. , 1993, The Journal of the Acoustical Society of America.

[14]  Amit Konar,et al.  Emotion Recognition: A Pattern Analysis Approach , 2015 .

[15]  Cynthia Breazeal,et al.  Recognition of Affective Communicative Intent in Robot-Directed Speech , 2002, Auton. Robots.

[16]  Clifford Nass,et al.  The media equation - how people treat computers, television, and new media like real people and places , 1996 .

[17]  Wootaek Lim,et al.  Speech emotion recognition using convolutional and Recurrent Neural Networks , 2016, 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).

[18]  Björn W. Schuller,et al.  Recent developments in openSMILE, the munich open-source multimedia feature extractor , 2013, ACM Multimedia.

[19]  O. Oltu,et al.  Implementation of a recognition algorithm in a reconfigurable hardware using a FPGA circuit , 2003, 2003 International Semiconductor Conference. CAS 2003 Proceedings (IEEE Cat. No.03TH8676).

[20]  C. Breazeal Regulating Human-Robot Interaction using “ emotions ” , “ drives ” and facial expressions , 1998 .

[21]  Idoia Cearreta,et al.  ASSISTIVE TECHNOLOGY AND AFFECTIVE MEDIATION , 2006 .

[22]  Bilge Mutlu,et al.  MACH: my automated conversation coach , 2013, UbiComp.