U-Net Based Direct-Path Dominance Test for Robust Direction-of-Arrival Estimation

It has been noted that the identification of the time-frequency bins dominated by the contribution from the direct propagation of the target speaker can significantly improve the robustness of the direction-of-arrival estimation. However, the correct extraction of the direct-path sound is challenging especially in adverse environments. In this paper, a U-net based direct-path dominance test method is proposed. Exploiting the efficient segmentation capability of the U-net architecture, the direct-path information can be effectively retrieved from a dedicated multi-task neural network. Moreover, the training and inference of the neural network only need the input of a single microphone, circumventing the problem of array-structure dependence faced by common end-to-end deep learning based methods. Simulations demonstrate that significantly higher estimation accuracy can be achieved in high reverberant and low signal-to-noise ratio environments.

[1]  E. Vincent,et al.  Joint DNN-Based Multichannel Reduction of Acoustic Echo, Reverberation and Noise , 2019, ArXiv.

[2]  Zhiwei Xiong,et al.  PHASEN: A Phase-and-Harmonics-Aware Speech Enhancement Network , 2019, AAAI.

[3]  Gert Herold,et al.  A deep learning method for grid-free localization and quantification of sound sources. , 2019, The Journal of the Acoustical Society of America.

[4]  Sharon Gannot,et al.  Machine learning in acoustics: Theory and applications. , 2019, The Journal of the Acoustical Society of America.

[5]  Boaz Rafaely,et al.  Direction of Arrival Estimation for Reverberant Speech Based on Enhanced Decomposition of the Direct Sound , 2019, IEEE Journal of Selected Topics in Signal Processing.

[6]  Mohammad Sohel Rahman,et al.  MultiResUNet : Rethinking the U-Net Architecture for Multimodal Biomedical Image Segmentation , 2019, Neural Networks.

[7]  Boaz Rafaely,et al.  Speaker localization using the direct-path dominance test for arbitrary arrays , 2018, 2018 IEEE International Conference on the Science of Electrical Engineering in Israel (ICSEE).

[8]  Soumitro Chakrabarty,et al.  Multi-Speaker DOA Estimation Using Deep Convolutional Networks Trained With Noise Signals , 2018, IEEE Journal of Selected Topics in Signal Processing.

[9]  Boaz Rafaely,et al.  Speaker localization using direct path dominance test based on sound field directivity , 2018, Signal Process..

[10]  Prasanga N. Samarasinghe,et al.  Estimating the Direct-to-Reverberant Energy Ratio Using a Spherical Harmonics-Based Spatial Correlation Model , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[11]  Kazunori Komatani,et al.  Sound source localization based on deep neural networks with directional activate function exploiting phase information , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[13]  Boaz Rafaely,et al.  Localization of Multiple Speakers under High Reverberation using a Spherical Microphone Array and the Direct-Path Dominance Test , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[14]  Nobutaka Ito,et al.  The Diverse Environments Multi-channel Acoustic Noise Database (DEMAND): A database of multichannel environmental noise recordings , 2013 .

[15]  Zhigang Wang,et al.  A MUSIC like DOA estimation method for signals with low SNR , 2008, 2008 Global Symposium on Millimeter Waves.

[16]  Douglas L. Jones,et al.  Localization of multiple acoustic sources with small arrays using a coherence test. , 2008, The Journal of the Acoustical Society of America.

[17]  Boaz Rafaely,et al.  Microphone Array Signal Processing , 2008 .

[18]  S. Gannot,et al.  Generating sensor signals in isotropic noise fields. , 2007, The Journal of the Acoustical Society of America.

[19]  Joseph H. DiBiase A High-Accuracy, Low-Latency Technique for Talker Localization in Reverberant Environments Using Microphone Arrays , 2000 .

[20]  Rich Caruana,et al.  Multitask Learning , 1997, Machine Learning.

[21]  R. O. Schmidt,et al.  Multiple emitter location and signal Parameter estimation , 1986 .

[22]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[23]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[24]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[25]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[26]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[27]  L. Sehgal,et al.  Γ and B , 2004 .

[28]  I. Miyazaki,et al.  AND T , 2022 .