Brno University of Technology system for Interspeech 2009 emotion challenge

This paper describes Brno University of Technology (BUT) system for the Interspeech 2009 Emotion Challenge. Our submitted system for the Open Performance Sub-Challenge uses acoustic frame based features as a front-end and Gaussian Mixture Models as a back-end. Different feature types and modeling approaches successfully applied in speakerand language recognition are investigated and we can achieve an 16% and 9% relative improvement over the best dynamic and static baseline system on the 5-class task, respectively.

[1]  Jordan Cohen,et al.  Vocal tract normalization in speech recognition: Compensating for systematic speaker variability , 1995 .

[2]  Lukás Burget,et al.  Contour modeling of prosodic and acoustic features for speaker recognition , 2008, 2008 IEEE Spoken Language Technology Workshop.

[3]  Pavel Matejka,et al.  Hierarchical Structures of Neural Networks for Phoneme Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[4]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[5]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[6]  Björn W. Schuller,et al.  The INTERSPEECH 2009 emotion challenge , 2009, INTERSPEECH.

[7]  Lukás Burget,et al.  Comparison of scoring methods used in speaker recognition with Joint Factor Analysis , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Lukás Burget,et al.  Analysis of Feature Extraction and Channel Compensation in a GMM Speaker Recognition System , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Lukás Burget,et al.  Brno University of Technology System for NIST 2005 Language Recognition Evaluation , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[10]  Patrick Kenny,et al.  A Study of Interspeaker Variability in Speaker Verification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.