The Dimensional Emotion Prediction of Multiple Cues and Modalities Using Improved KNN Algorithm

Automatic continuous dimensional prediction and analysis has been gaining rising attention by researchers in recent years. But most of them are focused on single cue or single modality. In this paper, we propose an improved KNN algorithm to estimate audio-visual dimensional emotions in valence-arousal space based on multiple cues and modalities. We select new samples from original data set by applying KMEANS, and set different weights on every selected K neighbor to obtain the dimensional estimations. We use GD algorithm to train model-level fusion models. As a result, improved KNN algorithm outperforms traditional KNN. Feature-level fusion prediction performs the best in predicting arousal and model-level fusion prediction is the best in valence estimation. Both of them are more effective than single modality.

[1]  Ke Wang,et al.  General Face Animation Expression Based on ICA , 2005 .

[2]  Björn W. Schuller,et al.  Data-driven clustering in emotional space for affect recognition using discriminatively trained LSTM networks , 2009, INTERSPEECH.

[3]  Carlos Busso,et al.  IEMOCAP: interactive emotional dyadic motion capture database , 2008, Lang. Resour. Evaluation.

[4]  Loïc Kessous,et al.  Modeling Naturalistic Affective States Via Facial, Vocal, and Bodily Expressions Recognition , 2007, Artifical Intelligence for Human Computing.

[5]  Hatice Gunes,et al.  Continuous Prediction of Spontaneous Affect from Multiple Cues and Modalities in Valence-Arousal Space , 2011, IEEE Transactions on Affective Computing.

[6]  Björn W. Schuller,et al.  Speaker Independent Speech Emotion Recognition by Ensemble Classification , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[7]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2009, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Stefanos Zafeiriou,et al.  Correlated-spaces regression for learning continuous emotion dimensions , 2013, MM '13.

[9]  Hatice Gunes,et al.  Audio-Visual Classification and Fusion of Spontaneous Affective Data in Likelihood Space , 2010, 2010 20th International Conference on Pattern Recognition.

[10]  Björn W. Schuller,et al.  Emotion representation, analysis and synthesis in continuous space: A survey , 2011, Face and Gesture 2011.

[11]  S. Baron-Cohen,et al.  Mind Reading: The Interactive Guide to Emotions , 2003 .

[12]  Wu Bao-feng Speech Emotion Feature Based on Acoustic Context Extraction and Analysis , 2013 .

[13]  Sergios Theodoridis,et al.  A dimensional approach to emotion recognition of speech from movies , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  K. Kroschel,et al.  Emotion Estimation in Speech Using a 3D Emotion Space Concept , 2007 .

[15]  George N. Votsis,et al.  Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[16]  Roddy Cowie,et al.  Emotion Recognition and Synthesis Based on MPEG‐4 FAPs , 2002 .

[17]  Hatice Gunes,et al.  Output-associative RVM regression for dimensional and continuous emotion prediction , 2011, Face and Gesture 2011.

[18]  Hatice Gunes,et al.  Automatic, Dimensional and Continuous Emotion Recognition , 2010, Int. J. Synth. Emot..

[19]  Maja J. Mataric,et al.  A Framework for Automatic Human Emotion Classification Using Emotion Profiles , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  Björn W. Schuller,et al.  Context-Sensitive Learning for Enhanced Audiovisual Emotion Classification , 2012, IEEE Trans. Affect. Comput..