Using fNIRS to Characterize Human Perception of TTS System Quality , Comprehension , and Fluency : Preliminary Findings

The quality of synthesized speech signals from different Text-to-Speech (TTS) systems is traditionally evaluated using subjective tests based on user ratings. Subjective testing, however, is challenging due to the variability and complexity of human perception. As such, recently there has been a shift towards exploring new objective techniques to evaluate the quality of TTS systems. In this paper, we describe our initial effort of characterizing human TTS quality perception via neurophysiological insights obtained from a neuroimaging technology called functional Near Infrared Spectroscopy (fNIRS). This approach allowed for a link between the human decision making process and the quality of different TTS systems to be established. We showed significant correlations between perceived quality and several fNIRS features related to cerebral haemodynamics. These preliminary results have helped establish the potential of fNIRS as an important tool for evaluating the quality of TTS systems.

[1]  R. Zatorre,et al.  Intensely pleasurable responses to music correlate with activity in brain regions implicated in reward and emotion , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[2]  N. Logothetis The Underpinnings of the BOLD Functional Magnetic Resonance Imaging Signal , 2003, The Journal of Neuroscience.

[3]  E. Koechlin,et al.  The Architecture of Cognitive Control in the Human Prefrontal Cortex , 2003, Science.

[4]  D. O'Shaughnessy,et al.  Auditory BCIs for Visually Impaired Users : Should Developers Worry About the Quality of Text-to-Speech Readers ? , 2013 .

[5]  Sungho Tak,et al.  NIRS-SPM: Statistical parametric mapping for near-infrared spectroscopy , 2009, NeuroImage.

[6]  J. Mandeville,et al.  The Accuracy of Near Infrared Spectroscopy and Imaging during Focal Changes in Cerebral Hemodynamics , 2001, NeuroImage.

[7]  Martin P Paulus,et al.  Ventromedial prefrontal cortex activation is critical for preference judgments , 2003, Neuroreport.

[8]  Masako Okamoto,et al.  Multimodal assessment of cortical activation during apple peeling by NIRS and fMRI , 2004, NeuroImage.

[9]  J. O'Doherty,et al.  Orbitofrontal Cortex Encodes Willingness to Pay in Everyday Economic Transactions , 2007, The Journal of Neuroscience.

[10]  Sebastian Möller,et al.  Neurophysiological experimental facility for Quality of Experience (QoE) assessment , 2013, 2013 IFIP/IEEE International Symposium on Integrated Network Management (IM 2013).

[11]  Gregory P. Lee,et al.  Different Contributions of the Human Amygdala and Ventromedial Prefrontal Cortex to Decision-Making , 1999, The Journal of Neuroscience.

[12]  Benedikt Hallgrímsson,et al.  Variation , 2006, Keywords and Concepts in Evolutionary Developmental Biology.

[13]  Simon King,et al.  The Blizzard Challenge 2009 , 2009 .

[14]  B. Balleine Neural bases of food-seeking: Affect, arousal and reward in corticostriatolimbic circuits , 2005, Physiology & Behavior.

[15]  Colin Camerer,et al.  A framework for studying the neurobiology of value-based decision making , 2008, Nature Reviews Neuroscience.

[16]  E. Rolls,et al.  The orbitofrontal cortex and beyond: From affect to decision-making , 2008, Progress in Neurobiology.

[17]  Sebastian Möller,et al.  Analyzing Speech Quality Perception Using Electroencephalography , 2012, IEEE Journal of Selected Topics in Signal Processing.

[18]  Simon King,et al.  Listeners' weighting of acoustic cues to synthetic speech naturalness: A multidimensional scaling analysis , 2011, Speech Commun..

[19]  David A. Boas,et al.  A Quantitative Comparison of Simultaneous BOLD fMRI and NIRS Recordings during Functional Brain Activation , 2002, NeuroImage.

[20]  Sabrina M. Tom,et al.  The Neural Basis of Loss Aversion in Decision-Making Under Risk , 2007, Science.