Speech Emotion Recognition using an Enhanced Co-Training Algorithm

In previous systems of speech emotion recognition, supervised learning are frequently employed to train classifiers on lots of labeled examples. However, the labeling of abundant data requires much time and many human efforts. This paper presents an enhanced co-training algorithm to utilize a large amount of unlabeled speech utterances for building a semi-supervised learning system. It uses two conditionally independent attribute views(i.e. temporal features and statistic features) of unlabeled examples to augment a much smaller set of labeled examples. Our experimental results demonstrate that compared with the method based on the supervised training, the proposed system makes 9.0% absolute improvement on female model and 7.4% on male model in terms of average accuracy. Moreover, the enhanced co-training algorithm achieves comparable performance to the co-training prototype, while it can reduce the classification noise which is produced by error labeling in the process of semi-supervised learning.

[1]  Lianhong Cai,et al.  Speech emotion classification with the combination of statistic features and temporal features , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[2]  Jiucang Hao,et al.  Emotion recognition by speech signals , 2003, INTERSPEECH.

[3]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[4]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[5]  Rebecca Hwa,et al.  Co-training for Predicting Emotions with Spoken Dialogue Data , 2004, ACL.

[6]  Thomas Hofmann,et al.  Semi-supervised Learning on Directed Graphs , 2004, NIPS.

[7]  Björn W. Schuller,et al.  Hidden Markov model-based speech emotion recognition , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[8]  Albino Nogueiras,et al.  Speech emotion recognition using hidden Markov models , 2001, INTERSPEECH.

[9]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[10]  J. Lafferty,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[11]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[12]  Yan Zhou,et al.  Enhancing Supervised Learning with Unlabeled Data , 2000, ICML.

[13]  Ioannis Pitas,et al.  Automatic emotional speech classification , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  George N. Votsis,et al.  Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..