Affective speech interface in serious games for supporting therapy of mental disorders

We describe a novel design, implementation and evaluation of a speech interface, as part of a platform for the development of serious games. The speech interface consists of the speech recognition component and the emotion recognition from speech component. The speech interface relies on a platform designed and implemented to support the development of serious games, which supports cognitive-based treatment of patients with mental disorders. The implementation of the speech interface is based on the Olympus/RavenClaw framework. This framework has been extended for the needs of the specific serious games and the respective application domain, by integrating new components, such as emotion recognition from speech. The evaluation of the speech interface utilized purposely collected domain-specific dataset. The speech recognition experiments show that emotional speech moderately affects the performance of the speech interface. Furthermore, the emotion detectors demonstrated satisfying performance for the emotion states of interest, Anger and Boredom, and contributed towards successful modelling of the patient's emotion status. The performance achieved for speech recognition and for the detection of the emotional states of interest was satisfactory. Recent evaluation of the serious games showed that the patients started to show new coping styles with negative emotions in normal stress life situations.

[1]  Theodoros Kostoulas,et al.  A Real-World Emotional Speech Corpus for Modern Greek , 2008, LREC.

[2]  Björn W. Schuller,et al.  The INTERSPEECH 2009 emotion challenge , 2009, INTERSPEECH.

[3]  Ronald Rosenfeld,et al.  Statistical language modeling using the CMU-cambridge toolkit , 1997, EUROSPEECH.

[4]  Fernando Fernández-Aranda,et al.  Internet-Based Cognitive-Behavioral Therapy for Bulimia Nervosa: A Controlled Study , 2009, Cyberpsychology Behav. Soc. Netw..

[5]  Mark Griffiths,et al.  Can Videogames be Good for Your Health? , 2004, Journal of Health Psychology.

[6]  Bryan P. Bergeron,et al.  Learning & Retention in Adaptive Serious Games , 2008, MMVR.

[7]  Björn W. Schuller,et al.  Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge , 2011, Speech Commun..

[8]  Michael K. Thomas,et al.  Making learning fun: Quest Atlantis, a game without guns , 2005 .

[9]  Cynthia Whissell,et al.  THE DICTIONARY OF AFFECT IN LANGUAGE , 1989 .

[10]  Nadia Magnenat-Thalmann,et al.  The PlayMancer Database: A Multimodal Affect Database in Support of Research and Development Activities in Serious Game Environment , 2010, LREC.

[11]  Theodoros Kostoulas,et al.  The Effect of Emotional Speech on a Smart-Home Application , 2008, IEA/AIE.

[12]  Otilia Kocsis,et al.  Robust speech interaction in motorcycle environment , 2010, Expert Syst. Appl..

[13]  Otilia Kocsis,et al.  Context-adaptive pre-processing scheme for robust speech recognition in fast-varying noise environment , 2011, Signal Process..

[14]  S. Jimenez-Murcia,et al.  P0367 - Cognitive-behavioral group treatment for pathological gambling: Analysis of effectiveness and predictors of therapy outcome , 2008, European Psychiatry.

[15]  Xavier Estivill,et al.  Contribution of the serotoninergic system to anxious and depressive traits that may be partially responsible for the phenotypical variability of bulimia nervosa. , 2008, Journal of psychiatric research.

[16]  Theodoros Kostoulas,et al.  Enhancing Emotion Recognition from Speech through Feature Selection , 2010, TSD.

[17]  Alexander I. Rudnicky,et al.  Ravenclaw: dialog management using hierarchical task decomposition and an expectation agenda , 2003, INTERSPEECH.

[18]  S. Cole,et al.  Improvement in cancer-related knowledge following use of a psychoeducational video game for adolescents and young adults with cancer. , 2007, The Journal of adolescent health : official publication of the Society for Adolescent Medicine.

[19]  Elmar Nöth,et al.  Private emotions versus social interaction: a data-driven approach towards analysing emotion in speech , 2008, User Modeling and User-Adapted Interaction.

[20]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[21]  Paolo Traverso,et al.  Service-Oriented Computing: State of the Art and Research Challenges , 2007, Computer.

[22]  Theodoros Kostoulas,et al.  Comparative Evaluation of Speech Parameterizations for Speech Recognition , 2007 .

[23]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[24]  Theodoros Kostoulas,et al.  Affect Recognition in Real Life Scenarios , 2010, COST 2102 Training School.

[25]  Björn Schuller,et al.  Towards measuring similarity between emotional corpora , 2010 .

[26]  Pim Cuijpers,et al.  Web-based cognitive behavioural therapy (W-CBT) for diabetes patients with co-morbid depression: Design of a randomised controlled trial , 2008, BMC psychiatry.

[27]  Theodoros Kostoulas,et al.  Study on Speaker-Independent Emotion Recognition from Speech on Real-World Data , 2007, COST 2102 Workshop.

[28]  Otilia Kocsis,et al.  Dynamic selection of a speech enhancement method for robust speech recognition in moving motorcycle environment , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[29]  Björn W. Schuller,et al.  OpenEAR — Introducing the munich open-source emotion and affect recognition toolkit , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[30]  Todor Ganchev,et al.  Video games as a complementary therapy tool in mental disorders: PlayMancer, a European multicentre study , 2012, Journal of mental health.

[31]  Rosa María Baños,et al.  The use of VR in the treatment of panic disorders and agoraphobia , 2004, Cybertherapy.

[32]  Veronika Brezinka Treasure Hunt - a serious game to support psychotherapeutic treatment of children , 2008 .

[33]  Björn W. Schuller,et al.  The INTERSPEECH 2010 paralinguistic challenge , 2010, INTERSPEECH.

[34]  Kevin O'Brien,et al.  EEG, HRV and Psychological Correlates while Playing Bejeweled II: A Randomized Controlled Study , 2009, Annual Review of Cybertherapy and Telemedicine.

[35]  Otilia Kocsis,et al.  Multi-modal System Architecture for Serious Gaming , 2009, AIAI.

[36]  Hannes Kaufmann,et al.  Playmancer: Games for Health with Accessibility in Mind , 2009 .

[37]  Mann Oo. Hay Emotion recognition in human-computer interaction , 2012 .

[38]  A. Vicentic,et al.  The CART (Cocaine- and Amphetamine-Regulated Transcript) System in Appetite and Drug Addiction , 2007, Journal of Pharmacology and Experimental Therapeutics.

[39]  Susana Jiménez-Murcia,et al.  Impulse control disorders in eating disorders: clinical and therapeutic implications. , 2006, Comprehensive psychiatry.

[40]  Hannes Kaufmann,et al.  Playmancer Project: A Serious Videogame as an Additional Therapy Tool for Eating and Impulse Control Disorders , 2009, Annual Review of Cybertherapy and Telemedicine.

[41]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[42]  Elias Kalapanidas,et al.  Serious games as additional psychological support: a review of the literature , 2011 .

[43]  Isabel Krug,et al.  Comparison of personality risk factors in bulimia nervosa and pathological gambling. , 2007, Comprehensive psychiatry.

[44]  Donald B. Rubin,et al.  Max-imum Likelihood from Incomplete Data , 1972 .