RANSAC-based training data selection for emotion recognition from spontaneous speech

Training datasets containing spontaneous emotional expressions are often imperfect due the ambiguities and difficulties of labeling such data by human observers. In this paper, we present a Random Sampling Consensus (RANSAC) based training approach for the problem of emotion recognition from spontaneous speech recordings. Our motivation is to insert a data cleaning process to the training phase of the Hidden Markov Models (HMMs) for the purpose of removing some suspicious instances of labels that may exist in the training dataset. Our experiments using HMMs with various number of states and Gaussian mixtures per state indicate that utilization of RANSAC in the training phase provides an improvement of up to 2.84% in the unweighted recall rates on the test set. . This improvement in the accuracy of the classifier is shown to be statistically significant using McNemar's test.

[1]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[2]  Björn W. Schuller,et al.  Patterns, prototypes, performance: classifying emotional user states , 2008, INTERSPEECH.

[3]  Min Xu,et al.  Efficient sampling of training set in large and noisy multimedia data , 2007, TOMCCAP.

[4]  Gunnar Rätsch,et al.  Regularizing AdaBoost , 1998, NIPS.

[5]  Frank Olken,et al.  Random Sampling from Databases , 1993 .

[6]  Björn W. Schuller,et al.  The INTERSPEECH 2009 emotion challenge , 2009, INTERSPEECH.

[7]  Eduardo Gasca,et al.  Decontamination of Training Samples for Supervised Pattern Recognition Methods , 2000, SSPR/SPR.

[8]  Pietro Perona,et al.  Pruning training sets for learning of object categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[9]  A. Tanju Erdem,et al.  Use of Line Spectral Frequencies for Emotion Recognition from Speech , 2010, 2010 20th International Conference on Pattern Recognition.

[10]  Isabelle Guyon,et al.  Discovering Informative Patterns and Data Cleaning , 1996, Advances in Knowledge Discovery and Data Mining.

[11]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[12]  Milan Sonka,et al.  Image Processing, Analysis and Machine Vision , 1993, Springer US.

[13]  A. Tanju Erdem,et al.  Improving automatic emotion recognition from speech signals , 2009, INTERSPEECH.

[14]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.