StressSense: detecting stress in unconstrained acoustic environments using smartphones

Stress can have long term adverse effects on individuals' physical and mental well-being. Changes in the speech production process is one of many physiological changes that happen during stress. Microphones, embedded in mobile phones and carried ubiquitously by people, provide the opportunity to continuously and non-invasively monitor stress in real-life situations. We propose StressSense for unobtrusively recognizing stress from human voice using smartphones. We investigate methods for adapting a one-size-fits-all stress model to individual speakers and scenarios. We demonstrate that the StressSense classifier can robustly identify stress across multiple individuals in diverse acoustic environments: using model adaptation StressSense achieves 81% and 76% accuracy for indoor and outdoor environments, respectively. We show that StressSense can be implemented on commodity Android phones and run in real-time. To the best of our knowledge, StressSense represents the first system to consider voice based stress detection and model adaptation in diverse real-life conversational situations using smartphones.

[1]  H. Teager Some observations on oral air flow during phonation , 1980 .

[2]  K. Scherer,et al.  Effect of experimentally induced stress on vocal parameters. , 1986, Journal of experimental psychology. Human perception and performance.

[3]  M. Appley,et al.  Dynamics of stress: Physiological, psychological, and social perspectives. , 1986 .

[4]  K. Scherer Voice, Stress, and Emotion , 1986 .

[5]  K. Scherer Vocal affect expression: a review and a model for future research. , 1986, Psychological bulletin.

[6]  M. Appley,et al.  Dynamics of Stress , 1986 .

[7]  J. Henry Stress, neuroendocrine patterns, and emotional response. , 1990 .

[8]  James F. Kaiser,et al.  Some useful properties of Teager's energy operators , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[10]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[11]  R. Kessler,et al.  Measuring stress: A guide for health and social scientists. , 1995 .

[12]  Bernard Harmegnies,et al.  Time- and spectrum-related variabilities in stressed speech under laboratory and real conditions , 1996, Speech Commun..

[13]  John H. L. Hansen,et al.  Analysis and compensation of speech under stress and noise for environmental robustness in speech recognition , 1996, Speech Commun..

[14]  John Saunders,et al.  Real-time discrimination of broadcast speech/music , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[15]  Kevin P. McSweeney,et al.  An evaluation of voice stress analysis techniques in a simulated AWACS environment , 1997, Int. J. Speech Technol..

[16]  Eric Fosler-Lussier,et al.  Speech recognition using on-line estimation of speaking rate , 1997, EUROSPEECH.

[17]  Treebank Penn,et al.  Linguistic Data Consortium , 1999 .

[18]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[19]  John H. L. Hansen,et al.  Nonlinear feature based classification of speech under stress , 2001, IEEE Trans. Speech Audio Process..

[20]  Klaus R. Scherer,et al.  Acoustic correlates of task load and stress , 2002, INTERSPEECH.

[21]  Sumit Basu A linked-HMM model for robust voicing and speech detection , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[22]  Rosalind W. Picard,et al.  Modeling drivers' speech under stress , 2003, Speech Commun..

[23]  Geoff Holmes,et al.  Benchmarking Attribute Selection Techniques for Discrete Class Data Mining , 2003, IEEE Trans. Knowl. Data Eng..

[24]  R. Lazarus,et al.  Comparison of two modes of stress measurement: Daily hassles and uplifts versus major life events , 1981, Journal of Behavioral Medicine.

[25]  Sumit Basu,et al.  Modeling Conversational Dynamics as a Mixed-Memory Markov Process , 2004, NIPS.

[26]  Léon J. M. Rothkrantz,et al.  Voice Stress Analysis , 2004, TSD.

[27]  John H. L. Hansen,et al.  Speech Under Stress: Analysis, Modeling and Recognition , 2007, Speaker Classification.

[28]  Jeff A. Bilmes,et al.  Conversation detection and speaker segmentation in privacy-sensitive situated speech data , 2007, INTERSPEECH.

[29]  John H. L. Hansen,et al.  Detection of speech under physical stress: model development, sensor selection, and feature fusion , 2008, INTERSPEECH.

[30]  Cecilia Mascolo,et al.  EmotionSense: a mobile phones based adaptive platform for experimental social psychology research , 2010, UbiComp.

[31]  Jie Liu,et al.  SpeakerSense: Energy Efficient Unobtrusive Speaker Identification on Mobile Phones , 2011, Pervasive.

[32]  J. Canny,et al.  AMMON : A Speech Analysis Library for Analyzing Affect , Stress , and Mental Health on Mobile Phones , 2011 .

[33]  Björn Hartmann,et al.  How's my mood and stress?: an efficient speech analysis library for unobtrusive monitoring on mobile phones , 2011, BODYNETS.

[34]  John H. L. Hansen,et al.  Robust Emotional Stressed Speech Detection Using Weighted Frequency Subbands , 2011, EURASIP J. Adv. Signal Process..