How's my mood and stress?: an efficient speech analysis library for unobtrusive monitoring on mobile phones

The human voice encodes a wealth of information about emotion, mood, stress, and mental state. With mobile phones (one of the mostly used modules in body area networks) this information is potentially available to a host of applications and can enable richer, more appropriate, and more satisfying human-computer interaction. In this paper we describe the AMMON (Affective and Mental health MONitor) library, a low footprint C library designed for widely available phones as an enabler of these applications. The library incorporates both core features for emotion recognition (from the Interspeech 2009 Emotion recognition challenge), and the most important features for mental health analysis (glottal timing features). To comfortably run the library on feature phones (the most widely-used class of phones today), we implemented the routines in fixed-point arithmetic, and minimized computational and memory footprint. On identical test data, emotion and stress classification accuracy was indistinguishable from a state-of-the-art reference system running on a PC, achieving 75% accuracy on two-class emotion classification tasks and 84% accuracy on binary classification of stressed and neutral situations. The library uses 30% of real-time on a 1GHz processor during emotion recognition and 70% during stress and mental health analysis.

[1]  Ricardo Gutierrez-Osuna,et al.  Using Heart Rate Monitors to Detect Mental Stress , 2009, 2009 Sixth International Workshop on Wearable and Implantable Body Sensor Networks.

[2]  Rosalind W. Picard,et al.  Modeling drivers' speech under stress , 2003, Speech Commun..

[3]  Wouter Hulstijn,et al.  Psychomotor symptoms in depression: a diagnostic, pathophysiological and therapeutic tool. , 2008, Journal of affective disorders.

[4]  K. Scherer,et al.  Vocal expression of affect , 2005 .

[5]  J. Peifer,et al.  Comparing objective feature statistics of speech for classifying clinical depression , 2004, The 26th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[6]  R. Cowie,et al.  The description of naturally occurring emotional speech , 2003 .

[7]  Wei Pan,et al.  SoundSense: scalable sound sensing for people-centric applications on mobile phones , 2009, MobiSys '09.

[8]  Rosalind W. Picard,et al.  An affective model of interplay between emotions and learning: reengineering educational pedagogy-building a learning companion , 2001, Proceedings IEEE International Conference on Advanced Learning Technologies.

[9]  J Healey,et al.  Quantifying driver stress: developing a system for collecting and processing bio-metric signals in natural situations. , 1999, Biomedical sciences instrumentation.

[10]  Cecilia Mascolo,et al.  EmotionSense: a mobile phones based adaptive platform for experimental social psychology research , 2010, UbiComp.

[11]  J. Cavanagh,et al.  A systematic review of manic and depressive prodromes. , 2003, Journal of affective disorders.

[12]  H. Sackeim,et al.  Psychomotor symptoms of depression. , 1997, The American journal of psychiatry.

[13]  Shumin Zhai,et al.  SHRIMP: solving collision and out of vocabulary problems in mobile predictive input with motion gesture , 2010, CHI.

[14]  Treebank Penn,et al.  Linguistic Data Consortium , 1999 .

[15]  Donald E. Knuth,et al.  The Art of Computer Programming, Volumes 1-3 Boxed Set , 1998 .

[16]  Michael E. Labhard,et al.  Mobile Therapy: Case Study Evaluations of a Cell Phone Application for Emotional Self-Awareness , 2010, Journal of medical Internet research.

[17]  John H. L. Hansen,et al.  Speech Under Stress: Analysis, Modeling and Recognition , 2007, Speaker Classification.

[18]  Loïc Kessous,et al.  The relevance of feature type for the automatic classification of emotional user states: low level descriptors and functionals , 2007, INTERSPEECH.