Text-to-speeches: evaluating the perception of concurrent speech by blind people

Over the years, screen readers have been an essential tool for assisting blind users in accessing digital information. Yet, its sequential nature undermines blind people's ability to efficiently find relevant information, despite the browsing strategies they have developed. We propose taking advantage of the Cocktail Party Effect, which states that people are able to focus on a single speech source among several conversations, but still identify relevant content in the background. Therefore, oppositely to one sequential speech channel, we hypothesize that blind people can leverage concurrent speech channels to quickly get the gist of digital information. In this paper, we present an experiment with 23 participants, which aims to understand blind people's ability to search for relevant content listening to two, three or four concurrent speech channels. Our results suggest that it is easy to identify the relevant source with two and three concurrent talkers. Moreover, both two and three sources may be used to understand the relevant source content depending on the task intelligibility demands and user characteristics.

[1]  G. Kramer Auditory Scene Analysis: The Perceptual Organization of Sound by Albert Bregman (review) , 2016 .

[2]  Markel Vigo,et al.  Coping tactics employed by visually disabled users on the web , 2013, Int. J. Hum. Comput. Stud..

[3]  D. Wechsler WAIS-R manual : Wechsler adult intelligence scale-revised , 1981 .

[4]  Constantine Stephanidis,et al.  A 3D-auditory environment for hierarchical navigation in non-visual interaction , 1996 .

[5]  Peter Parente,et al.  Clique: a conversant, task-based audio display for GUI applications , 2006, ASAC.

[6]  Jaka Sodnik,et al.  Enhanced Synthesized Text Reader for Visually Impaired Users , 2010, 2010 Third International Conference on Advances in Computer-Human Interactions.

[7]  Takayuki Watanabe Experimental evaluation of usability and accessibility of heading elements , 2009, Disability and rehabilitation. Assistive technology.

[8]  Allison Woodruff,et al.  The mad hatter's cocktail party: a social mobile audio space supporting multiple simultaneous conversations , 2003, CHI '03.

[9]  H. Burton Visual Cortex Activity in Early and Late Blind People , 2003, The Journal of Neuroscience.

[10]  C. Darwin,et al.  Effects of fundamental frequency and vocal-tract length changes on attention to one of two simultaneous talkers. , 2003, The Journal of the Acoustical Society of America.

[11]  Kenneth Hugdahl,et al.  Blind individuals show enhanced perceptual and attentional sensitivity for identification of speech sounds. , 2004, Brain research. Cognitive brain research.

[12]  Claude Alain Breaking the wave: Effects of attention and learning on concurrent sound perception , 2007, Hearing Research.

[13]  Chris Schmandt,et al.  AudioStreamer: exploiting simultaneity for listening , 1995, CHI 95 Conference Companion.

[14]  Takayuki Watanabe Experimental evaluation of usability and accessibility of heading elements , 2007, W4A '07.

[15]  I. V. Ramakrishnan,et al.  More than meets the eye: a survey of screen-reader browsing strategies , 2010, W4A.

[16]  F L Wightman,et al.  Localization using nonindividualized head-related transfer functions. , 1993, The Journal of the Acoustical Society of America.

[17]  Hironobu Takagi,et al.  Analysis of navigability of Web applications for improving blind usability , 2007, TCHI.

[18]  Harper Simon Deep Accessibility Adapting Interfaces to Suit Our Senses , 2013 .

[19]  I. V. Ramakrishnan,et al.  Why read if you can skim: towards enabling faster screen reading , 2012, W4A.

[20]  Shaojian Zhu,et al.  Sasayaki: augmented voice web browsing experience , 2011, CHI.

[21]  E. C. Cmm,et al.  on the Recognition of Speech, with , 2008 .

[22]  Roy D Patterson,et al.  The interaction of vocal characteristics and audibility in the recognition of concurrent syllables. , 2009, The Journal of the Acoustical Society of America.

[23]  A. Bronkhorst,et al.  Multichannel speech intelligibility and talker recognition using monaural, binaural, and three-dimensional auditory presentation. , 2000, The Journal of the Acoustical Society of America.

[24]  Barbara G. Shinn-Cunningham,et al.  Selective and Divided Attention: Extracting Information from Simultaneous Sound Sources , 2004, ICAD.

[25]  Brian D. Simpson,et al.  Optimizing the spatial configuration of a seven-talker speech display , 2005, TAP.

[26]  Stephen A. Brewster,et al.  Eyes-free multitasking: the effect of cognitive load on mobile spatial audio interfaces , 2011, CHI.

[27]  Brian D. Simpson,et al.  Improving Multitalker Speech Communication with Advanced Audio Displays , 2005 .

[28]  Richard E. Ladner,et al.  WebinSitu: a comparative analysis of blind and sighted browsing behavior , 2007, Assets '07.

[29]  Thiago Alexandre Salgueiro Pardo,et al.  Computational Processing of the Portuguese Language - 11th International Conference, PROPOR 2014, São Carlos/SP, Brazil, October 6-8, 2014. Proceedings , 2014, Lecture Notes in Computer Science.

[30]  Simon Harper,et al.  Gist summaries for visually impaired surfers , 2005, Assets '05.

[31]  Albert S. Bregman,et al.  The Auditory Scene. (Book Reviews: Auditory Scene Analysis. The Perceptual Organization of Sound.) , 1990 .

[32]  Luís C. Oliveira,et al.  DIXI - A Generic Text-to-Speech System for European Portuguese , 2008, PROPOR.

[33]  Carsten Möller,et al.  A 3D audio only interactive Web browser: using spatialization to convey hypermedia document structure , 1999, MULTIMEDIA '99.

[34]  Barry Arons,et al.  SpeechSkimmer: a system for interactively skimming recorded speech , 1997, TCHI.