Clarity-2021 Challenges: Machine Learning Challenges for Advancing Hearing Aid Processing

In recent years, rapid advances in speech technology have been made possible by machine learning challenges such as CHiME, REVERB, Blizzard, and Hurricane. In the Clarity project, the machine learning approach is applied to the problem of hearing aid processing of speech-in-noise, where current technology in enhancing the speech signal for the hearing aid wearer is often ineffective. The scenario is a (simulated) cuboid-shaped living room in which there is a single listener, a single target speaker and a single interferer, which is either a competing talker or domestic noise. All sources are static, the target is always within ±30◦ azimuth of the listener and at the same elevation, and the interferer is an omnidirectional point source at the same elevation. The target speech comes from an open source 40speaker British English speech database collected for this purpose. This paper provides a baseline description of the round one Clarity challenges for both enhancement (CEC1) and prediction (CPC1). To the authors’ knowledge, these are the first machine learning challenges to consider the problem of hearing aid speech signal processing.

[1]  Michael Vorländer,et al.  RAVEN: A real-time framework for the auralization of interactive virtual environments , 2011 .

[2]  Xavier Serra,et al.  Freesound technical demo , 2013, ACM Multimedia.

[3]  H. Kucera,et al.  Computational analysis of present-day American English , 1967 .

[4]  Sergei Kochkin Consumers Rate Improvements Sought in Hearing Instruments , 2002 .

[5]  B C Moore,et al.  Use of a loudness model for hearing aid fitting: II. Hearing aids with multi-channel compression. , 1999, British journal of audiology.

[6]  Jesper Jensen,et al.  Refinement and validation of the binaural short time objective intelligibility measure for spatially diverse conditions , 2018, Speech Commun..

[7]  S. King,et al.  The Blizzard Challenge 2014 , 2014 .

[8]  B. Moore,et al.  Simulation of the effect of threshold elevation and loudness recruitment combined with reduced frequency selectivity on the intelligibility of speech in noise. , 1997, The Journal of the Acoustical Society of America.

[9]  Jon Barker,et al.  CHiME-6 Challenge: Tackling Multispeaker Speech Recognition for Unsegmented Recordings , 2020, 6th International Workshop on Speech Processing in Everyday Environments (CHiME 2020).

[10]  Jesper Jensen,et al.  A binaural short time objective intelligibility measure for noisy and enhanced speech , 2015, INTERSPEECH.

[11]  Marion Burgess,et al.  Reverberation times in British living rooms , 1985 .

[12]  Isin Demirsahin,et al.  Open-source Multi-speaker Corpora of the English Accents in the British Isles , 2020, LREC.

[13]  R. Maas,et al.  A summary of the REVERB challenge: state-of-the-art and remaining challenges in reverberant speech processing research , 2016, EURASIP Journal on Advances in Signal Processing.

[14]  I. Parker Facts and figures. , 1973, The Probe.

[15]  Gary W. Elko,et al.  A simple adaptive first-order differential microphone , 1995, Proceedings of 1995 Workshop on Applications of Signal Processing to Audio and Accoustics.

[16]  Jesper Jensen,et al.  An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  H. Dillon,et al.  An international comparison of long‐term average speech spectra , 1994 .

[18]  Brian Lamb Obe,et al.  The Real Cost of Adult Hearing Loss: reducing its impact by increasing access to the latest hearing technologies. , 2014 .

[19]  Cassia Valentini-Botinhao,et al.  Intelligibility-enhancing speech modifications: the hurricane challenge , 2020, INTERSPEECH.