Comparing Multiple Classifiers for Speech-Based Detection of Self-Confidence - A Pilot Study

The aim of this study is to compare several classifiers commonly used within the field of speech emotion recognition (SER) on the speech based detection of self-confidence. A standard acoustic feature set was computed, resulting in 170 features per one-minute speech sample (e.g. fundamental frequency, intensity, formants, MFCCs). In order to identify speech correlates of self-confidence, the lectures of 14 female participants were recorded, resulting in 306 one-minute segments of speech. Five expert raters independently assessed the self-confidence impression. Several classification models (e.g. Random Forest, Support Vector Machine, Naïve Bayes, Multi-Layer Perceptron) and ensemble classifiers (AdaBoost, Bagging, Stacking) were trained. AdaBoost procedures turned out to achieve best performance, both for single models (AdaBoost LR: 75.2% class-wise averaged recognition rate) and for average boosting (59.3%) within speaker-independent settings.

[1]  Björn W. Schuller,et al.  Segmenting into Adequate Units for Automatic Recognition of Emotion-Related Episodes: A Speech-Based Approach , 2010, Adv. Hum. Comput. Interact..

[2]  Björn W. Schuller,et al.  Emotion recognition from speech: Putting ASR in the loop , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Sotiris B. Kotsiantis,et al.  Combining Bagging, Boosting and Dagging for Classification Problems , 2007, KES.

[4]  Gian Luca Foresti,et al.  Ensembling Classifiers - An application to image data classification from Cherenkov telescope experiment , 2007, IEC.

[5]  Stefan Steidl,et al.  Automatic classification of emotion related user states in spontaneous children's speech , 2009 .

[6]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[7]  Ingo Mierswa,et al.  YALE: rapid prototyping for complex data mining tasks , 2006, KDD '06.

[8]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[9]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2009, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  M. Boltz Temporal Dimensions of Conversational Interaction , 2005 .

[11]  Roddy Cowie,et al.  ASR for emotional speech: Clarifying the issues and enhancing performance , 2005, Neural Networks.

[12]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[13]  Georgios C. Anagnostopoulos,et al.  Knowledge-Based Intelligent Information and Engineering Systems , 2003, Lecture Notes in Computer Science.

[14]  Björn W. Schuller,et al.  Towards More Reality in the Recognition of Emotional Speech , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[15]  Klaus R. Scherer,et al.  The voice of confidence: Paralinguistic cues and audience evaluation. , 1973 .

[16]  SchullerBjörn,et al.  Segmenting into adequate units for automatic recognition of emotion-related episodes , 2010 .