Characterization of human emotions and preferences for text-to-speech systems using multimodal neuroimaging methods

Voice user interface and speech quality are normally assessed using subjective user experience testing methods and/or objective instrumental techniques. However, the recent advances in neurophysiological tools allowed useful human behavioral constructs to be measured in real-time, such as human emotion, perception, preferences and task performance. Electroencephalography (EEG), and functional near-infrared spectroscopy (fNIRS) are well received neuroimaging tools and they are being used in variety of different domains such as health science, neuromarketing, user experience (UX) research and multimedia quality of experience (QoE) discipline. Therefore, this paper describes the impact of natural and text-to-speech (TTS) signals on a user's affective state (valence and arousal) and their preferences using neuroimaging tools (EEG and fNIRS) and subjective user study. The EEG results showed that the natural and high quality TTS speech generate “positive valence”, that was inferred from a higher EEG asymmetric activation at frontal head region. fNIRS results showed the increased activation at Orbito-Frontal Cortex (OFC) region during decision making in favor of natural and high quality TTS speech signals. But natural and TTS signals have significantly different arousal levels.

[1]  C. C. Duncan,et al.  Event-related potentials in clinical research: Guidelines for eliciting, recording, and quantifying mismatch negativity, P300, and N400 , 2009, Clinical Neurophysiology.

[2]  E. Rolls,et al.  The orbitofrontal cortex and beyond: From affect to decision-making , 2008, Progress in Neurobiology.

[3]  Sebastian Möller,et al.  Using fNIRS to Characterize Human Perception of TTS System Quality , Comprehension , and Fluency : Preliminary Findings , 2013 .

[4]  Ivan Kraljevski,et al.  Synthesized Speech Quality Evaluation Using ITU-T P.563 , 2010 .

[5]  Jon D. Morris Observations: SAM: The Self-Assessment Manikin An Efficient Cross-Cultural Measurement Of Emotional Response 1 , 1995 .

[6]  K. Shinohara,et al.  NIRS as a tool for assaying emotional function in the prefrontal cortex , 2013, Front. Hum. Neurosci..

[7]  J. Mandeville,et al.  The Accuracy of Near Infrared Spectroscopy and Imaging during Focal Changes in Cerebral Hemodynamics , 2001, NeuroImage.

[8]  Sejin Yoo,et al.  Articulation-based sound perception in verbal repetition: a functional NIRS study , 2013, Front. Hum. Neurosci..

[9]  R. Zatorre,et al.  Intensely pleasurable responses to music correlate with activity in brain regions implicated in reward and emotion , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[10]  D. O'Shaughnessy,et al.  Auditory BCIs for Visually Impaired Users : Should Developers Worry About the Quality of Text-to-Speech Readers ? , 2013 .

[11]  Sungho Tak,et al.  NIRS-SPM: statistical parametric mapping for near infrared spectroscopy , 2008, SPIE BiOS.

[12]  N. Logothetis The Underpinnings of the BOLD Functional Magnetic Resonance Imaging Signal , 2003, The Journal of Neuroscience.

[13]  Simon King,et al.  The Blizzard Challenge 2008 , 2008 .

[14]  Sabrina M. Tom,et al.  The Neural Basis of Loss Aversion in Decision-Making Under Risk , 2007, Science.

[15]  E. Ross,et al.  Hemispheric specialization for emotions, affective aspects of language and communication and the cognitive control of display behaviors in humans. , 1996, Progress in brain research.

[16]  Arnaud Delorme,et al.  EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis , 2004, Journal of Neuroscience Methods.

[17]  정진욱,et al.  Statistical parametric mapping for near infrared spectroscopy using general linear model , 2007 .

[18]  Wendy Heller,et al.  Perception and expression of emotion in right-handers and left-handers , 1981, Neuropsychologia.

[19]  D. O. Bos,et al.  EEG-based Emotion Recognition The Influence of Visual and Auditory Stimuli , 2007 .

[20]  John J. B. Allen,et al.  Frontal EEG asymmetry as a moderator and mediator of emotion , 2004, Biological Psychology.

[21]  Sebastian Möller,et al.  Towards Signal-Based Instrumental Quality Diagnosis for Text-to-Speech Systems , 2008, IEEE Signal Processing Letters.

[22]  Simon King,et al.  The Blizzard Challenge 2009 , 2009 .

[23]  Colin Camerer,et al.  A framework for studying the neurobiology of value-based decision making , 2008, Nature Reviews Neuroscience.

[24]  S. Möller,et al.  An Evaluation Protocol for the Subjective Assessment of Text-to-Speech in Audiobook Reading Tasks , 2011 .

[25]  Sebastian Möller,et al.  Analyzing Speech Quality Perception Using Electroencephalography , 2012, IEEE Journal of Selected Topics in Signal Processing.

[26]  Sebastian Möller,et al.  Neurophysiological experimental facility for Quality of Experience (QoE) assessment , 2013, 2013 IFIP/IEEE International Symposium on Integrated Network Management (IM 2013).

[27]  Linda Ko,et al.  Near-infrared Spectroscopy as an Access Channel: Prefrontal Cortex Inhibition During an Auditory Go-no-go Task , 2009 .

[28]  Robert Schleicher,et al.  The effects of text-to-speech system quality on emotional states and frontal alpha band power , 2013, 2013 6th International IEEE/EMBS Conference on Neural Engineering (NER).