Directional Interference Suppression Using a Spatial Relative Transfer Function Feature

Many speech enhancement systems consist of a beamformer and a spectral suppression postfilter. While it is well understood how to design beamformers to suppress either non-directional or directional interference, their suppression ability is limited by the number, type and positions of microphones. However, spectral postfilters that can further increase the suppression, are usually only designed to suppress non-directional noise. In this work, we propose a spatially selective spectral suppressor addressing directional and non-directional interference. The proposed suppressor is based on the relative transfer function of the target source location. While existing directional suppression techniques are limited to farfield scenarios or certain microphone geometries, we propose a general approach without restrictions on the microphone array and without farfield assumption. We show that the proposed spatial suppressor is able to suppress noise and directional interfering speakers, which substantially improves the performance of speech recognizer, and reduces undesired recognition of interfering talkers.

[1]  Jacob Benesty,et al.  An Integrated Solution for Online Multichannel Noise Tracking and Reduction , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  N. P. Fan,et al.  Multichannel voice detection in adverse environments , 2002, 2002 11th European Signal Processing Conference.

[3]  Hong Kook Kim,et al.  Direction-of-Arrival Based SNR Estimation for Dual-Microphone Speech Enhancement , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[4]  Richard C. Hendriks,et al.  Unbiased MMSE-Based Noise Power Estimation With Low Complexity and Low Tracking Delay , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Akihiko Sugiyama,et al.  A directional noise suppressor with a specified beamwidth , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Jan Skoglund,et al.  Globally optimized least-squares post-filtering for microphone array speech enhancement , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  R. O. Schmidt,et al.  Multiple emitter location and signal Parameter estimation , 1986 .

[8]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[9]  Martin Bouchard,et al.  Instantaneous Binaural Target PSD Estimation for Hearing Aid Noise Reduction in Complex Acoustic Environments , 2011, IEEE Transactions on Instrumentation and Measurement.

[10]  David Middleton,et al.  Simultaneous optimum detection and estimation of signals in noise , 1968, IEEE Trans. Inf. Theory.

[11]  Jont B. Allen,et al.  Multimicrophone signal‐processing technique to remove room reverberation from speech signals , 1977 .

[12]  I. Cohen,et al.  Noise estimation by minima controlled recursive averaging for robust speech enhancement , 2002, IEEE Signal Processing Letters.

[13]  Ruey S. Tsay,et al.  Analysis of Financial Time Series , 2005 .

[14]  Holography Book,et al.  Fourier Acoustics Sound Radiation And Nearfield Acoustical Holography , 2016 .

[15]  Walter Kellermann,et al.  Coherent-to-Diffuse Power Ratio Estimation for Dereverberation , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[16]  Tomohiro Nakatani,et al.  Probabilistic spatial dictionary based online adaptive beamforming for meeting recognition in noisy and reverberant environments , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Israel Cohen,et al.  Relative transfer function identification using speech signals , 2004, IEEE Transactions on Speech and Audio Processing.

[18]  Hervé Bourlard,et al.  Microphone array post-filter based on noise field coherence , 2003, IEEE Trans. Speech Audio Process..

[19]  Emanuel A. P. Habets,et al.  An Informed Parametric Spatial Filter Based on Instantaneous Direction-of-Arrival Estimates , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[20]  Dong Yu,et al.  Conversational Speech Transcription Using Context-Dependent Deep Neural Networks , 2012, ICML.

[21]  Sharon Gannot,et al.  Evaluation and Comparison of Late Reverberation Power Spectral Density Estimators , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[22]  Jesper Jensen,et al.  Maximum likelihood based multi-channel isotropic reverberation reduction for hearing aids , 2014, 2014 22nd European Signal Processing Conference (EUSIPCO).

[23]  Zoran Saric,et al.  Adaptive microphone array free of the desired speaker cancellation combined with postfilter , 2008 .

[24]  Emanuel A. P. Habets,et al.  Narrowband direction-of-arrival estimation for binaural hearing aids using relative transfer functions , 2015, 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[25]  Emanuel A. P. Habets,et al.  An iterative multichannel subspace-based covariance subtraction method for relative transfer function estimation , 2017, 2017 Hands-free Speech Communications and Microphone Arrays (HSCMA).

[26]  Thomas Sikora,et al.  Noise robust relative transfer function estimation , 2006, 2006 14th European Signal Processing Conference.

[27]  Ivan Tashev,et al.  Microphone Array for Headset with Spatial Noise Suppressor , 2005 .