Emotion recognition from speech signals via a probabilistic echo-state network

The paper presents a probabilistic echo-state network (π-ESN) for density estimation over variable-length sequences of multivariate random vectors. The π-ESN stems from the combination of the reservoir of an ESN and a parametric density model based on radial basis functions. A constrained maximum likelihood training algorithm is introduced, suitable for sequence classification. Extensions of the algorithm to unsupervised clustering and semi-supervised learning (SSL) of sequences are proposed. Experiments in emotion recognition from speech signals are conducted on the WaSeP? dataset. Compared with established techniques, the π-ESN yields the highest recognition accuracies, and shows interesting clustering and SSL capabilities.

[1]  Harald Haas,et al.  Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication , 2004, Science.

[2]  Johannes Wagner,et al.  A Systematic Comparison of Different HMM Designs for Emotion Recognition from Acted and Spontaneous Speech , 2007, ACII.

[3]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[4]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[5]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[6]  Henry Markram,et al.  Real-Time Computing Without Stable States: A New Framework for Neural Computation Based on Perturbations , 2002, Neural Computation.

[7]  K. Scherer,et al.  Vocal expression of emotion. , 2003 .

[8]  J. E. Porter,et al.  Normalizations and selection of speech segments for speaker recognition scoring , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[9]  David C. Schmittlein,et al.  A Bayesian Cross-Validated Likelihood Method for Comparing Alternative Specifications of Quantitative Models , 1985 .

[10]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[11]  PAUL J. WERBOS,et al.  Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[12]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[13]  Hynek Hermansky,et al.  RASTA-PLP speech analysis technique , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Friedhelm Schwenker,et al.  Pattern classification and clustering: A review of partially supervised learning approaches , 2014, Pattern Recognit. Lett..

[15]  Frank Dellaert,et al.  Recognizing emotion in speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[16]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Zhigang Deng,et al.  Emotion recognition based on phoneme classes , 2004, INTERSPEECH.

[18]  Björn W. Schuller,et al.  Frame vs. Turn-Level: Emotion Recognition from Speech Considering Static and Dynamic Processing , 2007, ACII.

[19]  Günther Palm,et al.  Real-Time Emotion Recognition from Speech Using Echo State Networks , 2008, ANNPR.

[20]  R. Plutchik The Nature of Emotions , 2001 .

[21]  Friedhelm Schwenker,et al.  Maximum Echo-State-Likelihood Networks for Emotion Recognition , 2010, ANNPR.

[22]  Günther Palm,et al.  The GMM-SVM Supervector Approach for the Recognition of the Emotional Status from Speech , 2009, ICANN.

[23]  Günther Palm,et al.  Multimodal Laughter Detection in Natural Discourses , 2009, Human Centered Robot Systems, Cognition, Interaction, Technology.

[24]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[25]  Horst Bunke,et al.  Validation indices for graph clustering , 2003, Pattern Recognit. Lett..

[26]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .