Learning emotional speech by using Dirichlet Process Mixtures

Our aim in this paper is to illustrate the effectiveness of the Dirichlet Process Mixture (DPM) model for emotional speech class density estimation when the number of Gauss mixture components are unknown. The problem is modeled as a two-class classification problem where the classes are anger and-no-anger. Performance of the algorithm is evaluated on the features extracted from the emotion dataset EMO-DB, it is observed that the prior information inclusion led to increased non-anger recall rate. The introduced feature set performs perceptual analysis in time, spectral and Bark domains based on the Perceptual Evaluation of Audio Quality (PEAQ) model as described by the standard, ITU-R BS.1387-1 which provides a mathematical model resembling the human auditory system. Unlike the existing systems, the proposed feature set learns statistical characteristic of emotional differences hence enables us to represent the statistics of emotional audio with a small number of features.

[1]  Bo Zhang,et al.  Learning in Region-Based Image Retrieval , 2003, CIVR.

[2]  Björn W. Schuller,et al.  Acoustic emotion recognition: A benchmark comparison of performances , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[3]  Ingemar J. Cox,et al.  The Bayesian image retrieval system, PicHunter: theory, implementation, and psychophysical experiments , 2000, IEEE Trans. Image Process..

[4]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[5]  Wolfgang Minker,et al.  Emotion recognition and adaptation in spoken dialogue systems , 2010, Int. J. Speech Technol..

[6]  James Allan,et al.  Incremental relevance feedback for information filtering , 1996, SIGIR '96.

[7]  Janghyun Yoon,et al.  Relevance feedback for semantics based image retrieval , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[8]  Donna K. Harman,et al.  Relevance feedback revisited , 1992, SIGIR '92.

[9]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[10]  Chih-Jen Lin,et al.  Probability Estimates for Multi-class Classification by Pairwise Coupling , 2003, J. Mach. Learn. Res..

[11]  P. Moral,et al.  Sequential Monte Carlo samplers , 2002, cond-mat/0212648.

[12]  Panu Somervuo,et al.  Self-Organizing Maps and Learning Vector Quantization for Feature Sequences , 1999, Neural Processing Letters.

[13]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Tim Polzehl,et al.  Detecting real life anger , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Günes Karabulut-Kurt,et al.  A novel perceptual feature set for audio emotion recognition , 2011, Face and Gesture 2011.

[16]  Björn W. Schuller,et al.  OpenEAR — Introducing the munich open-source emotion and affect recognition toolkit , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[17]  Shrikanth S. Narayanan,et al.  Toward detecting emotions in spoken dialogs , 2005, IEEE Transactions on Speech and Audio Processing.

[18]  RECOMMENDATION ITU-R BS.1387-1 - Method for objective measurements of perceived audio quality , 2002 .

[19]  Gerard Salton,et al.  Optimization of relevance feedback weights , 1995, SIGIR '95.

[20]  Thierry Pun,et al.  Strategies for positive and negative relevance feedback in image retrieval , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[21]  Shyi-Ming Chen,et al.  A new query reweighting method for document retrieval based on genetic algorithms , 2006, IEEE Transactions on Evolutionary Computation.

[22]  Paul A. Viola,et al.  Boosting Image Retrieval , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[23]  Janusz R. Getta,et al.  Semantic modeling for video content-based retrieval systems , 2000, Proceedings 23rd Australasian Computer Science Conference. ACSC 2000 (Cat. No.PR00518).

[24]  Thomas S. Huang,et al.  Relevance feedback in image retrieval: A comprehensive review , 2003, Multimedia Systems.

[25]  Ali Taylan Cemgil,et al.  Annealed SMC Samplers for Nonparametric Bayesian Mixture Models , 2011, IEEE Signal Processing Letters.