The paper presents a comparison of different classification techniques for the task of classifying a speaker's emotional state into one of two classes: aroused and normal. The comparison was conducted using the WEKA (The Waikato Environment for Knowledge Analysis) open source software which consists of a collection of machine learning algorithms for data mining. The aim of this paper is to investigate the efficiency of different classification methods to recognize the emotional state of a speaker with features obtained by a constraint version of the Maximum Likelihood Linear Regression (CMLLR). For our experiments we adopted the multi-modal AvID database of emotions, which comprises 1708 samples of utterances each lasting at least 15 seconds. The database was randomly divided into a training set and a testing set in a ratio of 5:1. Since there are much more samples in the database belonging to the neutral class than to the aroused class, the latter was over-sampled to ensure that both classes in contained equal numbers of samples in the training set. The build-in WEKA classifiers were divided into five groups based on their theoretical foundation, i.e., the group of classifiers related to the Bayes's theorem, the group of distance-based classifiers, the group of discriminant classifiers, the group of neural networks, and finally the group of decision tree classifiers. From each group we present the results of the best evaluated algorithms with respect to the unweighted average recall.
[1]
João Gama,et al.
Functional Trees
,
2001,
Machine Learning.
[2]
Rok Gajsek,et al.
Combining Audio and Video for Detection of Spontaneous Emotions
,
2009,
COST 2101/2102 Conference.
[3]
Carla E. Brodley,et al.
Recursive automatic bias selection for classifier construction
,
1995,
Machine Learning.
[4]
Simon Dobrisek,et al.
Spoken Language Resources at LUKS of the University of Ljubljana
,
2003,
Int. J. Speech Technol..
[5]
Ian H. Witten,et al.
The WEKA data mining software: an update
,
2009,
SKDD.
[6]
Johannes Wagner,et al.
Automatic Recognition of Emotions from Speech: A Review of the Literature and Recommendations for Practical Realisation
,
2008,
Affect and Emotion in Human-Computer Interaction.
[7]
G. McLachlan,et al.
The EM algorithm and extensions
,
1996
.
[8]
Rok Gajsek,et al.
Multi-Modal Emotional Database: AvID
,
2009,
Informatica.
[9]
Tom Fawcett,et al.
An introduction to ROC analysis
,
2006,
Pattern Recognit. Lett..
[10]
C. Brodley.
Recursive Automatic Bias Selection for Classifier Construction
,
2004,
Machine Learning.
[11]
Chih-Jen Lin,et al.
LIBSVM: A library for support vector machines
,
2011,
TIST.
[12]
Mark J. F. Gales,et al.
Maximum likelihood linear transformations for HMM-based speech recognition
,
1998,
Comput. Speech Lang..