论文信息 - Dynamic speech emotion recognition with state-space models

Dynamic speech emotion recognition with state-space models

Automatic emotion recognition from speech has been focused mainly on identifying categorical or static affect states, but the spectrum of human emotion is continuous and time-varying. In this paper, we present a recognition system for dynamic speech emotion based on state-space models (SSMs). The prediction of the unknown emotion trajectory in the affect space spanned by Arousal, Valence, and Dominance (A-V-D) descriptors is cast as a time series filtering task. The state space models we investigated include a standard linear model (Kalman filter) as well as novel non-linear, non-parametric Gaussian Processes (GP) based SSM. We use the AVEC 2014 database for evaluation, which provides ground truth A-V-D labels which allows state and measurement functions to be learned separately simplifying the model training. For the filtering with GP SSM, we used two approximation methods: a recently proposed analytic method and Particle filter. All models were evaluated in terms of average Pearson correlation R and root mean square error (RMSE). The results show that using the same feature vectors, the GP SSMs achieve twice higher correlation and twice smaller RMSE than a Kalman filter.

[1] Carl E. Rasmussen,et al. State-Space Inference and Learning with Gaussian Processes , 2010, AISTATS.

[2] Nando de Freitas,et al. Sequential Monte Carlo Methods in Practice , 2001, Statistics for Engineering and Information Science.

[3] Carl E. Rasmussen,et al. Bayesian Inference and Learning in Gaussian Process State-Space Models with Particle MCMC , 2013, NIPS.

[4] Ieee Staff. 2017 25th European Signal Processing Conference (EUSIPCO) , 2017 .

[5] Carl E. Rasmussen,et al. Robust Filtering and Smoothing with Gaussian Processes , 2012, IEEE Transactions on Automatic Control.

[6] Dieter Fox,et al. GP-BayesFilters: Bayesian filtering using Gaussian process prediction and observation models , 2008, IROS.

[7] Heng Wang,et al. Depression recognition based on dynamic facial and vocal expression features using partial least square regression , 2013, AVEC@ACM Multimedia.

[8] Björn W. Schuller,et al. Abandoning emotion classes - towards continuous emotion recognition with modelling of long-range dependencies , 2008, INTERSPEECH.

[9] Björn Schuller,et al. Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[10] S. Haykin. Kalman Filtering and Neural Networks , 2001 .

[11] Youngmoo E. Kim,et al. Prediction of Time-Varying Musical Mood Distributions Using Kalman Filtering , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[12] Christopher K. I. Williams,et al. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[13] Jongmin Kim,et al. Phoneme Classification using Constrained Variational Gaussian Process Dynamical System , 2012, NIPS.

[14] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[15] David J. Fleet,et al. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Gaussian Process Dynamical Model , 2007 .

[16] Gustav Eje Henter,et al. Gaussian process dynamical models for nonparametric speech representation and synthesis , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17] Tomoko Matsui,et al. Music Genre and Emotion Recognition Using Gaussian Processes , 2014, IEEE Access.

[18] M. Deisenroth,et al. A general perspective on Gaussian filtering and smoothing: Explaining current and deriving new algorithms , 2011, Proceedings of the 2011 American Control Conference.

[19] Uwe D. Hanebeck,et al. Analytic moment-based Gaussian process filtering , 2009, ICML '09.

[20] Carl E. Rasmussen,et al. Gaussian Processes for Machine Learning (GPML) Toolbox , 2010, J. Mach. Learn. Res..

[21] Mohamed Chetouani,et al. Robust continuous prediction of human emotions using multiscale dynamic cues , 2012, ICMI '12.

[22] Björn W. Schuller,et al. AVEC 2014: 3D Dimensional Affect and Depression Recognition Challenge , 2014, AVEC '14.

[23] Nadia Bianchi-Berthouze,et al. Naturalistic Affective Expression Classification by a Multi-stage Approach Based on Hidden Markov Models , 2011, ACII.