Classification of Emotional Speech Based on an Automatically Elaborated Hierarchical Classifier

Current machine-based techniques for vocal emotion recognition only consider a finite number of clearly labeled emotional classes whereas the kinds of emotional classes and their number are typically application dependent. Previous studies have shown that multistage classification scheme, because of ambiguous nature of affect classes, helps to improve emotion classification accuracy. However, these multistage classification schemes were manually elaborated by taking into account the underlying emotional classes to be discriminated. In this paper, we propose an automatically elaborated hierarchical classification scheme (ACS), which is driven by an evidence theory-based embedded feature-selection scheme (ESFS), for the purpose of application-dependent emotions' recognition. Experimented on the Berlin dataset with 68 features and six emotion states, this automatically elaborated hierarchical classifier (ACS) showed its effectiveness, displaying a 71.38% classification accuracy rate compared to a 71.52% classification rate achieved by our previously dimensional model-driven but still manually elaborated multistage classifier (DEC). Using the DES dataset with five emotion states, our ACS achieved a 76.74% recognition rate compared to a 81.22% accuracy rate displayed by a manually elaborated multistage classification scheme (DEC).

[1]  Liming Chen,et al.  Voice-Based Gender Identification in Multimedia Applications , 2005, Journal of Intelligent Information Systems.

[2]  Mann Oo. Hay Emotion recognition in human-computer interaction , 2012 .

[3]  Constantine Kotropoulos,et al.  Automatic speech classification to five emotional states based on gender information , 2004, 2004 12th European Signal Processing Conference.

[4]  Ioannis Pitas,et al.  Automatic emotional speech classification , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Andreas Stolcke,et al.  Distinguishing deceptive from non-deceptive speech , 2005, INTERSPEECH.

[6]  Björn W. Schuller,et al.  Evolutionary Feature Generation in Speech Emotion Recognition , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[7]  Zhongzhe Xiao Recognition of emotions in audio signals , 2008 .

[8]  Berthold Schweizer,et al.  Probabilistic Metric Spaces , 2011 .

[9]  K. Scherer,et al.  Acoustic profiles in vocal emotion expression. , 1996, Journal of personality and social psychology.

[10]  Elisabeth André,et al.  Comparing Feature Sets for Acted and Spontaneous Speech in View of Automatic Emotion Recognition , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[11]  R. Fullér OWA Operators in Decision Making , 2003 .

[12]  Marcin Detyniecki,et al.  Mathematical Aggregation Operators and their Application to Video Querying , 2000 .

[13]  Stephen E. Levinson,et al.  Children's emotion recognition in an intelligent tutoring scenario , 2004, INTERSPEECH.

[14]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2009, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Oh-Wook Kwon,et al.  EMOTION RECOGNITION BY SPEECH SIGNAL , 2003 .

[16]  Emmanuel Dellandréa,et al.  Multi-stage classification of emotional speech motivated by a dimensional emotion model , 2009, Multimedia Tools and Applications.

[17]  B. Schweizer,et al.  Statistical metric spaces. , 1960 .

[18]  Julia Hirschberg,et al.  Detecting certainness in spoken tutorial dialogues , 2005, INTERSPEECH.

[19]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[20]  Emmanuel Dellandréa,et al.  Ambiguous classification of emotional speech , 2008, LREC 2008.

[21]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[22]  Emmanuel Dellandréa,et al.  Image Categorization Using ESFS: A New Embedded Feature Selection Method Based on SFS , 2009, ACIVS.

[23]  Arthur P. Dempster,et al.  A Generalization of Bayesian Inference , 1968, Classic Works of the Dempster-Shafer Theory of Belief Functions.

[24]  J. Ross Quinlan,et al.  Improved Use of Continuous Attributes in C4.5 , 1996, J. Artif. Intell. Res..

[25]  Liyanage C. De Silva,et al.  Voting ensembles for spoken affect classification , 2007, J. Netw. Comput. Appl..

[26]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[27]  Constantine Kotropoulos,et al.  Emotional speech classification using Gaussian mixture models , 2005, 2005 IEEE International Symposium on Circuits and Systems.

[28]  Björn W. Schuller,et al.  Speaker Independent Speech Emotion Recognition by Ensemble Classification , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[29]  K. Menger Statistical Metrics. , 1942, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Arthur P. Dempster,et al.  Upper and Lower Probabilities Induced by a Multivalued Mapping , 1967, Classic Works of the Dempster-Shafer Theory of Belief Functions.

[31]  Constantine Kotropoulos,et al.  Emotional Speech Classification Using Gaussian Mixture Models and the Sequential Floating Forward Selection Algorithm , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[32]  Björn W. Schuller,et al.  Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[33]  Jiucang Hao,et al.  Emotion recognition by speech signals , 2003, INTERSPEECH.

[34]  Björn W. Schuller,et al.  Brute-forcing hierarchical functionals for paralinguistics: A waste of feature space? , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[35]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[36]  Richard Nock,et al.  A hybrid filter/wrapper approach of feature selection using information theory , 2002, Pattern Recognit..

[37]  Guido Fioretti,et al.  Evidence Theory: A Mathematical Framework for Unpredictable Hypotheses , 2004 .

[38]  Emmanuel Dellandréa,et al.  Recognition of emotions in speech by a hierarchical approach , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[39]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.