Real-Time Emotional Speech Processing for Neurorobotics Applications

The ability for humans to understand and process the emotional content of speech is unsurpassed by simulated intelligent agents. Beyond the linguistic content of speech are the underlying prosodic features naturally understood by humans. The goal of emotional speech processing systems is to extract and classify human speech for these so called paralinguistic elements. Presented here is a proof-of-concept system designed to analyze speech in real-time for coupled interactions with spiking neural models. Based on proven feature extraction algorithms, the resulting system provides two interface options to running simulations on the NeoCortical Simulator. Some basic tests using new recordings as well as a subset from a published emotional database were completed with promising results.

[1]  Valery A. Petrushin,et al.  EMOTION IN SPEECH: RECOGNITION AND APPLICATION TO CALL CENTERS , 1999 .

[2]  Tiago H. Falk,et al.  Automatic recognition of speech emotion using long-term spectro-temporal features , 2009, 2009 16th International Conference on Digital Signal Processing.

[3]  Elisabeth André,et al.  Improving Automatic Emotion Recognition from Speech via Gender Differentiaion , 2006, LREC.

[4]  Mann Oo. Hay Emotion recognition in human-computer interaction , 2012 .

[5]  Björn W. Schuller,et al.  Combining frame and turn-level information for robust recognition of emotions within speech , 2007, INTERSPEECH.

[6]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[7]  Roddy Cowie,et al.  Automatic recognition of emotion from voice: a rough benchmark , 2000 .

[8]  Sergiu-Mihai Dascalu,et al.  Virtual Neurorobotics (VNR) to Accelerate Development of Plausible Neuromorphic Brain Architectures , 2007, Frontiers in neurorobotics.

[9]  Elisabeth André,et al.  Comparing Feature Sets for Acted and Spontaneous Speech in View of Automatic Emotion Recognition , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[10]  Sergiu-Mihai Dascalu,et al.  Framework and Implications of Virtual Neurorobotics , 2008, Front. Neurosci..

[11]  Nicholas T. Carnevale,et al.  Simulation of networks of spiking neurons: A review of tools and strategies , 2006, Journal of Computational Neuroscience.

[12]  Ragini Verma,et al.  Class-level spectral features for emotion recognition , 2010, Speech Commun..

[13]  Aijun Li,et al.  Prosody conversion from neutral speech to emotional speech , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Frederick C Harris,et al.  Breaking the virtual barrier: real-time interactions with spiking neural models , 2010, BMC Neuroscience.

[15]  古井 貞煕,et al.  Digital speech processing, synthesis, and recognition , 1989 .

[16]  Andreas Stolcke,et al.  Prosody-based automatic detection of annoyance and frustration in human-computer dialog , 2002, INTERSPEECH.

[17]  Shrikanth S. Narayanan,et al.  Toward detecting emotions in spoken dialogs , 2005, IEEE Transactions on Speech and Audio Processing.

[18]  Yoon Keun Kwak,et al.  Improved Emotion Recognition With a Novel Speaker-Independent Feature , 2009, IEEE/ASME Transactions on Mechatronics.

[19]  Alfred Mertins,et al.  Automatic speech recognition and speech variability: A review , 2007, Speech Commun..

[20]  Constantine Kotropoulos,et al.  Emotional speech recognition: Resources, features, and methods , 2006, Speech Commun..

[21]  Sophocles J. Orfanidis,et al.  Optimum Signal Processing: An Introduction , 1988 .