An Introduction to the Speech Enhancement for Augmented Reality (Spear) Challenge

It is well known that microphone arrays can be used to enhance a target speaker in a noisy, reverberant environment, with both spatial (e.g. beamforming) and statistical (e.g. source separation) methods proving effective. Head-worn microphone arrays inherently sample a sound field from an egocentric perspective — when the head moves the apparent direction of even static sound sources change with respect to the array. Traditionally, enhancement algorithms have aimed at being robust to head motion but hearable devices and augmented reality (AR) headsets/glasses contain additional sensors which offer the potential to adapt to, or even exploit, head motion. The recently released EasyCom database contains microphone array recordings of group conversations made in a realistic restaurant-like acoustic scene. In addition to egocentric recordings made with AR glasses, extensive metadata, including the position and orientation of speakers, is provided. This paper describes the use and adaptation of EasyCom for a new IEEE SPS Data Challenge.

[1]  V. Pulkki,et al.  Enhancing binaural rendering of head-worn microphone arrays through the use of adaptive spatial covariance matching. , 2022, The Journal of the Acoustical Society of America.

[2]  D. Alon,et al.  A Head-Mounted Microphone Array for Binaural Rendering , 2021, 2021 Immersive and 3D Audio: from Architecture to Automotive (I3DA).

[3]  Maja Pantic,et al.  EasyCom: An Augmented Reality Dataset to Support Algorithms for Easy Communication in Noisy Environments , 2021, ArXiv.

[4]  Isin Demirsahin,et al.  Open-source Multi-speaker Corpora of the English Accents in the British Isles , 2020, LREC.

[5]  Mike Brookes,et al.  Noise Covariance Matrix Estimation for Rotating Microphone Arrays , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[6]  Jonathan Le Roux,et al.  SDR – Half-baked or Well Done? , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Mike Brookes,et al.  Binaural Mask-Informed Speech Enhancement for Hearing AIDS with Head Tracking , 2018, 2018 16th International Workshop on Acoustic Signal Enhancement (IWAENC).

[8]  Birger Kollmeier,et al.  Adapting Hearing Devices to the Individual Ear Acoustics: Database and Target Response Correction Functions for Various Device Styles , 2018, Trends in hearing.

[9]  Giso Grimm,et al.  A toolbox for rendering virtual acoustic environments in the context of audiology , 2018, Acta Acustica united with Acustica.

[10]  Christine Weston,et al.  A conformal, helmet-mounted microphone array for auditory situational awareness and hearing protection , 2017, 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[11]  Rainer Martin,et al.  Binaural speaker localization and separation based on a joint ITD/ILD model and head movement tracking , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Boaz Rafaely,et al.  Direction of Arrival Estimation Using Microphone Array Processing for Moving Humanoid Robots , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[13]  Jesper Jensen,et al.  An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Marc Moonen,et al.  Acoustic Beamforming for Hearing Aid Applications , 2010 .

[15]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Tapio Lokki,et al.  Augmented reality audio for mobile and wearable appliances , 2004 .

[17]  Andries P. Hekstra,et al.  Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[18]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[19]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.