Development and evaluation of video recordings for the OLSA matrix sentence test

OBJECTIVE The aim was to create and validate an audiovisual version of the German matrix sentence test (MST), which uses the existing audio-only speech material. DESIGN Video recordings were recorded and dubbed with the audio of the existing German MST. The current study evaluates the MST in conditions including audio and visual modalities, speech in quiet and noise, and open and closed-set response formats. SAMPLE One female talker recorded repetitions of the German MST sentences. Twenty-eight young normal-hearing participants completed the evaluation study. RESULTS The audiovisual benefit in quiet was 7.0 dB in sound pressure level (SPL). In noise, the audiovisual benefit was 4.9 dB in signal-to-noise ratio (SNR). Speechreading scores ranged from 0% to 84% speech reception in visual-only sentences (mean = 50%). Audiovisual speech reception thresholds (SRTs) had a larger standard deviation than audio-only SRTs. Audiovisual SRTs improved successively with increasing number of lists performed. The final video recordings are openly available. CONCLUSIONS The video material achieved similar results as the literature in terms of gross speech intelligibility, despite the inherent asynchronies of dubbing. Due to ceiling effects, adaptive procedures targeting 80% intelligibility should be used. At least one or two training lists should be performed.

[1]  Thomas Brand,et al.  Evaluation of context effects in sentence recognition. , 2002, The Journal of the Acoustical Society of America.

[2]  K. Grant,et al.  Measures of auditory-visual integration for speech understanding: a theoretical perspective. , 2002, The Journal of the Acoustical Society of America.

[3]  G. Grimm Audio-visual stimuli for the evaluation of speech-enhancing algorithms , 2019 .

[4]  Thomas Brand,et al.  Measuring Speech Recognition With a Matrix Test Using Synthetic Speech , 2019, Trends in hearing.

[5]  A. John van Opstal,et al.  The Principle of Inverse Effectiveness in Audiovisual Speech Perception , 2019, bioRxiv.

[6]  Birger Kollmeier,et al.  Efficient adaptive procedures for threshold and concurrent slope estimates for psychophysics and speech intelligibility tests. , 2002, The Journal of the Acoustical Society of America.

[7]  Anna Warzybok,et al.  The multilingual matrix test: Principles, applications, and comparison across languages: A review , 2015, International journal of audiology.

[8]  Deniz Başkent,et al.  Audiovisual Asynchrony Detection and Speech Intelligibility in Noise With Moderate to Severe Sensorineural Hearing Impairment , 2011, Ear and hearing.

[9]  R. E. Talbott,et al.  Research Needs in Speech Audiometry , 1983 .

[10]  M. Walger,et al.  Validating a Method to Assess Lipreading, Audiovisual Gain, and Integration During Speech Reception With Cochlear-Implanted and Normal-Hearing Subjects Using a Talking Head , 2017, Ear and hearing.

[11]  J. Wouters,et al.  Speech intelligibility of virtual humans , 2018, International journal of audiology.

[12]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[13]  Word error and confusion patterns in an audiovisual German matrix sentence test ( OLSA ) , 2019 .

[14]  Torsten Dau,et al.  Data-driven approach for auditory profiling , 2017 .

[15]  L. Auger The Journal of the Acoustical Society of America , 1949 .

[16]  Joon Son Chung,et al.  You Said That?: Synthesising Talking Faces from Audio , 2019, International Journal of Computer Vision.

[17]  B. Kollmeier,et al.  International Collegium of Rehabilitative Audiology (ICRA) recommendations for the construction of multilingual speech tests , 2015, International journal of audiology.

[18]  Moshe Mahler,et al.  Dynamic units of visual speech , 2012, SCA '12.

[19]  Yisong Yue,et al.  A deep learning approach for generalized speech animation , 2017, ACM Trans. Graph..

[20]  Audiovisual speech reception in noise and self-perceived hearing disability in sensorineural hearing loss. , 1997, Audiology : official organ of the International Society of Audiology.

[21]  C. Thiel,et al.  Linking audiovisual integration to audiovisual speech recognition in noise , 2020 .

[22]  Q. Summerfield,et al.  Lipreading and audio-visual speech perception. , 1992, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[23]  Karen Lander,et al.  Does face familiarity influence speechreadability? , 2008, Quarterly journal of experimental psychology.

[24]  L. Bernstein,et al.  Enhanced visual speech perception in individuals with early-onset hearing impairment. , 2007, Journal of speech, language, and hearing research : JSLHR.

[25]  M. Sommers,et al.  The effects of age and gender on lipreading abilities. , 2007, Journal of the American Academy of Audiology.

[26]  Tim Jürgens,et al.  Talker- and language-specific effects on speech intelligibility in noise assessed with bilingual talkers: Which language is more robust against noise and reverberation? , 2015, International journal of audiology.

[27]  Federico Sukno,et al.  Automatic Viseme Vocabulary Construction to Enhance Continuous Lip-reading , 2017, VISIGRAPP.

[28]  C. Lind,et al.  Choosing talkers for the BKB/A Speechreading Test: a procedure with observations on talker age and gender. , 1995, British journal of audiology.

[29]  Saiful Adli Jamaluddin Development and evaluation of the digit triplet and auditory-visual matrix sentence tests in Malay , 2016 .

[30]  Paul Duchnowski,et al.  Development of speechreading supplements based on automatic speech recognition , 2000, IEEE Trans. Biomed. Eng..

[31]  Anna Warzybok,et al.  Construction and first evaluation of the Italian Matrix Sentence Test for the assessment of speech intelligibility in noise , 2014 .

[32]  Birger Kollmeier,et al.  Monaural speech intelligibility and detection in maskers with varying amounts of spectro-temporal speech features. , 2016, The Journal of the Acoustical Society of America.

[33]  David Poeppel,et al.  Discrimination of auditory-visual synchrony , 2003, AVSP.

[34]  Birger Kollmeier,et al.  A Spanish matrix sentence test for assessing speech reception thresholds in noise , 2012, International journal of audiology.

[35]  G. Grimm,et al.  Video recordings for the female German Matrix Sentence Test (OLSA) , 2020 .

[36]  Ken W. Grant,et al.  Toward a Model of Auditory-Visual Speech Intelligibility , 2019, Multisensory Processes.

[37]  Gerard Llorach,et al.  Towards Realistic Immersive Audiovisual Simulations for Hearing Research: Capture, Virtual Scenes and Reproduction , 2018, AVSU@MM.

[38]  B. Dodd,et al.  Review of visual speech perception by hearing and hearing-impaired people: clinical implications. , 2009, International journal of language & communication disorders.

[39]  Torsten Dau,et al.  Data-Driven Approach for Auditory Profiling and Characterization of Individual Hearing Loss , 2018, Trends in hearing.

[40]  Björn Lidestam,et al.  Audiovisual training is better than auditory-only training for auditory-only speech-in-noise identification. , 2014, The Journal of the Acoustical Society of America.

[41]  Lynne E. Bernstein,et al.  Auditory speech detection in noise enhanced by lipreading , 2004, Speech Commun..

[42]  W. H. Sumby,et al.  Visual contribution to speech intelligibility in noise , 1954 .

[43]  Ronald Harris Trounson,et al.  Development of the UC Auditory-visual Matrix Sentence Test , 2012 .

[44]  G F Smoorenburg,et al.  Speech reception in quiet and in noisy conditions by individuals with noise-induced hearing loss in relation to their tone audiogram. , 1989, The Journal of the Acoustical Society of America.

[45]  Pamela E Souza,et al.  Prediction of speech recognition from audibility in older listeners with hearing loss: effects of age, amplification, and background noise. , 2007, Journal of the American Academy of Audiology.

[46]  A. Macleod,et al.  Quantifying the contribution of vision to speech perception in noise. , 1987, British journal of audiology.