Speech-to-screen : spatial separation of dialogue from noise towards improved speech intelligibility for the small screen

Can externalizing dialogue when in the presence of stereo background noise improve speech intelligibility? This has been investigated for audio over headphones using head-tracking in order to explore potential future developments for small-screen devices. A quantitative listening experiment tasked participants with identifying target words in spoken sentences played in the presence of background noise via headphones. Sixteen different combinations of three independent variables were tested: speech and noise locations (internalized/externalized), video (on/off), and masking noise (stationary/fluctuating noise). The results revealed that the best improvements to speech intelligibility were generated by both the video-on condition and externalizing speech at the screen while retaining masking noise in the stereo mix.

[1]  L L Elliott,et al.  Development of a test of speech intelligibility in noise using sentence materials with controlled word predictability. , 1977, The Journal of the Acoustical Society of America.

[2]  Jennifer McNally,et al.  How Millennials and Teens Consume Mobile Video , 2017, TVX.

[3]  A. Jongman,et al.  Acoustic characteristics of clearly spoken English fricatives. , 2009, The Journal of the Acoustical Society of America.

[4]  T J Edwards Multiple features analysis of intervocalic English plosives. , 1981, The Journal of the Acoustical Society of America.

[5]  A. Jongman,et al.  Acoustic characteristics of English fricatives. , 2000, The Journal of the Acoustical Society of America.

[6]  Barry-John Theobald,et al.  Comparing visual features for lipreading , 2009, AVSP.

[7]  B. Fazenda,et al.  On the subjective nature of binaural externalisation , 2013 .

[8]  Jörg M. Buchholz,et al.  Release from masking through spatial separation in distance in hearing impaired listeners , 2013 .

[9]  Brad H Story,et al.  Analysis of high-frequency energy in long-term average spectra of singing, speech, and voiceless fricatives. , 2012, The Journal of the Acoustical Society of America.

[10]  Anthi Chaida,et al.  Acoustic structure of fricative consonants in Greek , 2019, ExLing.

[11]  Kumara Shama,et al.  Study of Harmonics-to-Noise Ratio and Critical-Band Energy Spectrum of Speech as Acoustic Indicators of Laryngeal and Voice Pathology , 2007, EURASIP J. Adv. Signal Process..

[12]  Jon Barker,et al.  Modelling speaker intelligibility in noise , 2007, Speech Commun..

[13]  David B. Pisoni,et al.  Intelligibility of normal speech I: Global and fine-grained acoustic-phonetic talker characteristics , 1996, Speech Commun..

[14]  Anna L. Cox,et al.  Media Multitasking at Home: A Video Observation Study of Concurrent TV and Mobile Device Usage , 2017, TVX.

[15]  Frank Melchior,et al.  Descriptive Analysis of Binaural Rendering with Virtual Loudspeakers Using a Rate-All-That-Apply Approach , 2016 .

[16]  W. O. Brimijoin,et al.  The Contribution of Head Movement to the Externalization and Internalization of Sounds , 2013, PloS one.

[17]  Jayaganesh Swaminathan,et al.  Role of Binaural Temporal Fine Structure and Envelope Cues in Cocktail-Party Listening , 2016, The Journal of Neuroscience.

[18]  Frank Melchior,et al.  Does Environmental Noise Influence Preference of Background-Foreground Audio Balance? , 2016 .