Development and Evaluation of the Emotional Slovenian Speech Database - EmoLUKS

This paper describes a speech database built from 17 Slovenian radio dramas. The dramas were obtained from the national radio-and-television station RTV Slovenia and were given at the universities disposal with an academic license for processing and annotating the audio material. The utterances of one male and one female speaker were transcribed, segmented and then annotated with emotional states of the speakers. The annotation of the emotional states was conducted in two stages with our own web-based application for crowd sourcing. The final emotional speech database consists of 1385 recordings of one male 975 recordings and one female 410 recordings speaker and contains labeled emotional speech with a total duration of around 1 hour and 15 minutes. The paper presents the two-stage annotation process used to label the data and demonstrates the usefulness of the employed annotation methodology. Baseline emotion recognition experiments are also presented. The reported results are presented with the un-weighted as well as weighted average recalls and precisions for 2-class and 7-class recognition experiments.

[1]  Randolph R. Cornelius,et al.  The science of emotion: Research and tradition in the psychology of emotion. , 1997 .

[2]  Mark Liberman,et al.  Transcriber: Development and use of a tool for assisting speech corpora production , 2001, Speech Commun..

[3]  Björn W. Schuller,et al.  Recent developments in openSMILE, the munich open-source multimedia feature extractor , 2013, ACM Multimedia.

[4]  Roddy Cowie,et al.  Describing the emotional states that are expressed in speech , 2003, Speech Commun..

[5]  Rok Gajsek,et al.  Multi-Modal Emotional Database: AvID , 2009, Informatica.

[6]  Wojciech Majewski,et al.  Polish Emotional Speech Database - Recording and Preliminary Validation , 2009, COST 2102 Conference.

[7]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[8]  Björn W. Schuller,et al.  The INTERSPEECH 2009 emotion challenge , 2009, INTERSPEECH.

[9]  S. Steidl,et al.  The Prosody of Pet Robot Directed Speech: Evidence from Children. , 2006 .

[10]  Vitomir Štruc,et al.  Towards Efficient Multi-Modal Emotion Recognition , 2013 .

[11]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[12]  Shashidhar G. Koolagudi,et al.  Emotion recognition from speech: a review , 2012, International Journal of Speech Technology.

[13]  Yoichi Yamashita,et al.  A review of paralinguistic information processing for natural speech communication , 2013 .

[14]  Simon Dobri Emotion Recognition using Linear Transformations in Combination with Video , 2009 .

[15]  Randolph R. Cornelius THEORETICAL APPROACHES TO EMOTION , 2000 .

[16]  Benjamin J. Southwell,et al.  Human Object Recognition Using Colour and Depth Information from an RGB-D Kinect Sensor , 2013 .

[17]  Rok Gajsek,et al.  Gender and affect recognition based on GMM and GMM-UBM modeling with relevance MAP estimation , 2010, INTERSPEECH.

[18]  Björn W. Schuller,et al.  Paralinguistics in speech and language - State-of-the-art and the challenge , 2013, Comput. Speech Lang..