Speech Features for Discriminating Stress Using Branch and Bound Wrapper Search

Stress detection from speech is a less explored field than Automatic Emotion Recognition and it is still not clear which features are better stress discriminants. The project VOCE aims at doing speech classification as stressed or not-stressed in real-time, using acoustic-prosodic features only. We therefore look for the best discriminating feature subsets from a set of 6125 features extracted with openSMILE toolkit plus 160 Teager Energy Operator (TEO) features. We use a Mutual Information (MI) filter and a branch and bound wrapper heuristic with an SVM classifier to perform feature selection. Since many feature sets are selected, we analyse them in terms of chosen features and classifier performance concerning also true positive and false positive rates. The results show that the best feature types for our application case are Audio Spectral, MFCC, PCM and TEO. We reached results as high as 70.4 % for generalisation accuracy.

[1]  A. Enis Çetin,et al.  Teager energy based feature parameters for speech recognition in car noise , 1999, IEEE Signal Processing Letters.

[2]  John H. L. Hansen,et al.  Speech Under Stress: Analysis, Modeling and Recognition , 2007, Speaker Classification.

[3]  John H. L. Hansen,et al.  Recent advances in hypernasal speech detection using the nonlinear Teager energy operator , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[4]  Ronald K. Pearson,et al.  Exploring Data in Engineering, the Sciences, and Medicine , 2011 .

[5]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[6]  Klaus R. Scherer,et al.  Acoustic correlates of task load and stress , 2002, INTERSPEECH.

[7]  Pascale Fung,et al.  A Cross Gender and Cross Lingual Study on Acoustic Features for Stress Recognition in Speech , 2011, ICPhS.

[8]  Jan P. H. van Santen,et al.  Review of Handbook of standards and resources for spoken language systems by Dafydd Gibbon, Roger Moore, and Richard Winski. Mouton de Gruyter 1997. , 1998 .

[9]  Timothy C. Miller,et al.  Public Speaking Apprehension (PSA), Motivation, and Affect among Accounting Majors: A Proof‐of‐Concept Intervention , 2009 .

[10]  Elmar Nöth,et al.  The INTERSPEECH 2012 Speaker Trait Challenge , 2012, INTERSPEECH.

[11]  Rosalind W. Picard,et al.  Modeling drivers' speech under stress , 2003, Speech Commun..

[12]  Magdalena Jastrz,et al.  ANALYSIS OF VOICE STRESS IN CALL CENTERS , 2012 .

[13]  Helena Moniz,et al.  Bilingual Experiments on Automatic Recovery of Capitalization and Punctuation of Automatic Speech Transcripts , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  San Cristóbal Mateo,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996 .

[15]  Björn W. Schuller,et al.  Paralinguistics in speech and language - State-of-the-art and the challenge , 2013, Comput. Speech Lang..

[16]  Ruhi Sarikaya,et al.  Subband based classification of speech under stress , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[17]  Xuejing Sun A pitch determination algorithm based on subharmonic-to-harmonic ratio , 2000, INTERSPEECH.

[18]  Loïc Kessous,et al.  The relevance of feature type for the automatic classification of emotional user states: low level descriptors and functionals , 2007, INTERSPEECH.

[19]  Ana Aguiar,et al.  Speech stress assessment using physiological and psychological measures , 2013, UbiComp.

[20]  Daniel Gatica-Perez,et al.  StressSense: detecting stress in unconstrained acoustic environments using smartphones , 2012, UbiComp.

[21]  John H. L. Hansen,et al.  Getting started with SUSAS: a speech under simulated and actual stress database , 1997, EUROSPEECH.

[22]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[23]  Zhao Li,et al.  Data intensive parallel feature selection method study , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[24]  Ramón Fernández Astudillo,et al.  The L2F Spoken Web Search system for Mediaeval 2012 , 2012, MediaEval.

[25]  G. Demenko Voice Stress Extraction , 2008 .

[26]  Johannes Wagner,et al.  Automatic Recognition of Emotions from Speech: A Review of the Literature and Recommendations for Practical Realisation , 2008, Affect and Emotion in Human-Computer Interaction.

[27]  John H. L. Hansen,et al.  Nonlinear feature based classification of speech under stress , 2001, IEEE Trans. Speech Audio Process..

[28]  Ana Aguiar,et al.  VOCE Corpus: Ecologically Collected Speech Annotated with Physiological and Psychological Stress Assessments , 2014, LREC.

[29]  Vipin Kumar,et al.  Feature Selection: A literature Review , 2014, Smart Comput. Rev..

[30]  A. Boquet,et al.  Cluster analyses of cardiovascular responsivity to three laboratory stressors. , 1991, Psychosomatic medicine.