Analysis of excitation source features of speech for emotion recognition

During production of emotional speech there are deviations in the components of speech production mechanism when compared to normal speech. The objective of this study is to capture the deviations in features related to the excitation source component of speech, and to develop a system for automatic recognition of emotions based on these deviations. The emotions considered for this study are: anger, happy, neutral and sad. The study shows that there are useful features in the deviations of the excitation source features at subsegmental level, and they can be exploited to develop an emotion recognition system. A hierarchical binary decision tree approach is used for classification.

[1]  Jim Euchner Design , 2014, Catalysis from A to Z.

[2]  Lukás Burget,et al.  Application of speaker- and language identification state-of-the-art techniques for emotion recognition , 2011, Speech Commun..

[3]  Klaus R. Scherer,et al.  Vocal communication of emotion: A review of research paradigms , 2003, Speech Commun..

[4]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[5]  Bayya Yegnanarayana,et al.  Analysis of emotional speech at subsegmental level , 2013, INTERSPEECH.

[6]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[7]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[8]  Kornel Laskowski,et al.  Combining Efforts for Improving Automatic Classification of Emotional User States , 2006 .

[9]  George N. Votsis,et al.  Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[10]  Shrikanth S. Narayanan,et al.  The Vera am Mittag German audio-visual emotional speech database , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[11]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[12]  Bayya Yegnanarayana,et al.  Naturalistic Audio-Visual Emotion Database , 2014, ICON.

[13]  Paul Dalsgaard,et al.  Design, recording and verification of a danish emotional speech database , 1997, EUROSPEECH.

[14]  Bayya Yegnanarayana,et al.  Excitation source features for discrimination of anger and happy emotions , 2014, INTERSPEECH.

[15]  S. R. Mahadeva Prasanna,et al.  Analysis of excitation source information in emotional speech , 2010, INTERSPEECH.

[16]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[17]  Roddy Cowie,et al.  Emotional speech: Towards a new generation of databases , 2003, Speech Commun..

[18]  Carlos Busso,et al.  Emotion recognition using a hierarchical binary decision tree approach , 2011, Speech Commun..

[19]  Shashidhar G. Koolagudi,et al.  Emotion recognition from speech: a review , 2012, International Journal of Speech Technology.

[20]  M. Picheny,et al.  Comparison of Parametric Representation for Monosyllabic Word Recognition in Continuously Spoken Sentences , 2017 .

[21]  Carlos Busso,et al.  Analysis of Emotionally Salient Aspects of Fundamental Frequency for Emotion Detection , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Tiago H. Falk,et al.  Automatic speech emotion recognition using modulation spectral features , 2011, Speech Commun..

[23]  Robert I. Damper,et al.  Classification of emotional speech using 3DEC hierarchical classifier , 2012, Speech Commun..

[24]  Bayya Yegnanarayana,et al.  Voiced/Nonvoiced Detection Based on Robustness of Voiced Epochs , 2010, IEEE Signal Processing Letters.

[25]  Shrikanth S. Narayanan,et al.  Toward detecting emotions in spoken dialogs , 2005, IEEE Transactions on Speech and Audio Processing.

[26]  Tiago H. Falk,et al.  Automatic recognition of speech emotion using long-term spectro-temporal features , 2009, 2009 16th International Conference on Digital Signal Processing.

[27]  Bayya Yegnanarayana,et al.  Event-Based Instantaneous Fundamental Frequency Estimation From Speech Signals , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[28]  Carlos Busso,et al.  Shape-based modeling of the fundamental frequency contour for emotion detection in speech , 2014, Comput. Speech Lang..

[29]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[31]  Bayya Yegnanarayana,et al.  Discriminating Neutral and Emotional Speech using Neural Networks , 2014, ICON.