OpenEAR — Introducing the munich open-source emotion and affect recognition toolkit

Various open-source toolkits exist for speech recognition and speech processing. These toolkits have brought a great benefit to the research community, i.e. speeding up research. Yet, no such freely available toolkit exists for automatic affect recognition from speech. We herein introduce a novel open-source affect and emotion recognition engine, which integrates all necessary components in one highly efficient software package. The components include audio recording and audio file reading, state-of-the-art paralinguistic feature extraction and plugable classification modules. In this paper we introduce the engine and extensive baseline results. Pre-trained models for four affect recognition tasks are included in the openEAR distribution. The engine is tailored for multi-threaded, incremental on-line processing of live input in real-time, however it can also be used for batch processing of databases.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  Paul Boersma,et al.  Praat: doing phonetics by computer , 2003 .

[3]  Björn W. Schuller,et al.  The INTERSPEECH 2009 emotion challenge , 2009, INTERSPEECH.

[4]  Bin Yang,et al.  Psychological Motivated Multi-Stage Emotion Classification Exploiting Voice Quality Features , 2008 .

[5]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[6]  Stan Matwin,et al.  Discriminative parameter learning for Bayesian networks , 2008, ICML '08.

[7]  Loïc Kessous,et al.  The relevance of feature type for the automatic classification of emotional user states: low level descriptors and functionals , 2007, INTERSPEECH.

[8]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2009, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[10]  Björn Schuller,et al.  Being bored? Recognising natural interest by extensive audiovisual integration for real-life application , 2009, Image Vis. Comput..

[11]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[12]  Björn W. Schuller,et al.  Abandoning emotion classes - towards continuous emotion recognition with modelling of long-range dependencies , 2008, INTERSPEECH.

[13]  Elisabeth André,et al.  EmoVoice - A Framework for Online Recognition of Emotions from Voice , 2008, PIT.

[14]  Björn W. Schuller,et al.  Brute-forcing hierarchical functionals for paralinguistics: A waste of feature space? , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Shrikanth S. Narayanan,et al.  Support Vector Regression for Automatic Recognition of Spontaneous Emotions in Speech , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[17]  Björn W. Schuller,et al.  Static and Dynamic Modelling for the Recognition of Non-verbal Vocalisations in Conversational Speech , 2008, PIT.