Real-time robust recognition of speakers' emotions and characteristics on mobile platforms

We demonstrate audEERING's sensAI technology running natively on low-resource mobile devices applied to emotion analytics and speaker characterisation tasks. A showcase application for the Android platform is provided, where au-dEERING's highly noise robust voice activity detection based on Long Short-Term Memory Recurrent Neural Networks (LSTM-RNN) is combined with our core emotion recognition and speaker characterisation engine natively on the mobile device. This eliminates the need for network connectivity and allows to perform robust speaker state and trait recognition efficiently in real-time without network transmission lags. Real-time factors are benchmarked for a popular mobile device to demonstrate the efficiency, and average response times are compared to a server based approach. The output of the emotion analysis is visualized graphically in the arousal and valence space alongside the emotion category and further speaker characteristics.

[1]  Fabio Valente,et al.  The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism , 2013, INTERSPEECH.

[2]  Kyu-Sik Park,et al.  A Study of Speech Emotion Recognition and Its Application to Mobile Services , 2007, UIC.

[3]  Fabien Ringeval,et al.  The INTERSPEECH 2014 computational paralinguistics challenge: cognitive & physical load , 2014, INTERSPEECH.

[4]  Matthew Kam,et al.  Rethinking Speech Recognition on Mobile Devices , 2011 .

[5]  Matti Pietikäinen,et al.  Bi-Modal Person Recognition on a Mobile Phone: Using Mobile Phone Data , 2012, 2012 IEEE International Conference on Multimedia and Expo Workshops.

[6]  Björn W. Schuller,et al.  Recent developments in openSMILE, the munich open-source multimedia feature extractor , 2013, ACM Multimedia.

[7]  Björn W. Schuller,et al.  Towards distributed recognition of emotion from speech , 2012, 2012 5th International Symposium on Communications, Control and Signal Processing.

[8]  Eduardo Coutinho,et al.  Distributing Recognition in Computational Paralinguistics , 2014, IEEE Transactions on Affective Computing.

[9]  Felix Burkhardt,et al.  A Database of Age and Gender Annotated Telephone Speech , 2010, LREC.

[10]  Björn W. Schuller,et al.  The Computational Paralinguistics Challenge [Social Sciences] , 2012, IEEE Signal Processing Magazine.

[11]  Catherine Pelachaud,et al.  The TARDIS Framework: Intelligent Virtual Agents for Social Coaching in Job Interviews , 2013, Advances in Computer Entertainment.

[12]  Björn W. Schuller,et al.  What Should a Generic Emotion Markup Language Be Able to Represent? , 2007, ACII.

[13]  Eduardo Coutinho,et al.  Cooperative Learning and its Application to Emotion Recognition from Speech , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[14]  Elmar Nöth,et al.  The INTERSPEECH 2015 computational paralinguistics challenge: nativeness, parkinson's & eating condition , 2015, INTERSPEECH.

[15]  Zheng-Hua Tan,et al.  Automatic speech recognition on mobile devices and over communication networks , 2008 .

[16]  Björn Schuller,et al.  The Computational Paralinguistics Challenge , 2012 .

[17]  Björn Schuller,et al.  openSMILE:): the Munich open-source large-scale multimedia feature extractor , 2015, ACMMR.

[18]  Paul Lukowicz,et al.  Analysis of Chewing Sounds for Dietary Monitoring , 2005, UbiComp.

[19]  Björn Hartmann,et al.  How's my mood and stress?: an efficient speech analysis library for unobtrusive monitoring on mobile phones , 2011, BODYNETS.

[20]  Björn W. Schuller,et al.  Real-life voice activity detection with LSTM Recurrent Neural Networks and an application to Hollywood movies , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21]  Björn W. Schuller,et al.  The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing , 2016, IEEE Transactions on Affective Computing.

[22]  Cecilia Mascolo,et al.  EmotionSense: a mobile phones based adaptive platform for experimental social psychology research , 2010, UbiComp.

[23]  Kang-Kue Lee,et al.  Robust feature extraction for mobile-based speech emotion recognition system , 2006 .

[24]  Björn W. Schuller,et al.  The INTERSPEECH 2009 emotion challenge , 2009, INTERSPEECH.

[25]  Mohan M. Trivedi,et al.  2010 International Conference on Pattern Recognition Speech Emotion Analysis in Noisy Real-World Environment , 2022 .

[26]  Giuseppe Di Fabbrizio,et al.  A speech mashup framework for multimodal mobile services , 2009, ICMI-MLMI '09.