Online Direction of Arrival Estimation Based on Deep Learning

Direction of arrival (DOA) estimation is an important topic in microphone array processing. Conventional methods work well in relatively clean conditions but suffer from noise and reverberation distortions. Recently, deep learning-based methods show the robustness to noise and reverberation. However, the performance is degraded rapidly or even model cannot work when microphone array structure changes. So it has to retrain the model with new data, which is a huge work. In this paper, we propose a supervised learning algorithm for DOA estimation combining convolutional neural network (CNN) and long short term memory (LSTM). Experimental results show that the proposed method can improve the accuracy significantly. In addition, due to an input feature design, the proposed method can adapt to a new microphone array conveniently only use a very small amount of data.

[1]  Petre Stoica,et al.  Maximum likelihood methods for direction-of-arrival estimation , 1990, IEEE Trans. Acoust. Speech Signal Process..

[2]  Dong Yu,et al.  Single-channel mixed speech recognition using deep neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Jacob Benesty,et al.  Time-delay estimation via linear interpolation and cross correlation , 2004, IEEE Transactions on Speech and Audio Processing.

[4]  Rodney W. Johnson,et al.  Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy , 1980, IEEE Trans. Inf. Theory.

[5]  Emanuel A. P. Habets,et al.  Broadband doa estimation using convolutional neural networks trained with noise signals , 2017, 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[6]  Michael S. Brandstein,et al.  A robust method for speech signal time-delay estimation in reverberant rooms , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[8]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[9]  Kazunori Komatani,et al.  Sound source localization based on deep neural networks with directional activate function exploiting phase information , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Kung Yao,et al.  Source localization and beamforming , 2002, IEEE Signal Process. Mag..

[11]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[12]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[13]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[14]  R. O. Schmidt,et al.  Multiple emitter location and signal Parameter estimation , 1986 .

[15]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[16]  Haizhou Li,et al.  A learning-based approach to direction of arrival estimation in noisy and reverberant environments , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Daniel P. W. Ellis,et al.  Noise Robust Pitch Tracking by Subband Autocorrelation Classification , 2012, INTERSPEECH.

[18]  Jacob Benesty,et al.  Real-time passive source localization: a practical linear-correction least-squares approach , 2001, IEEE Trans. Speech Audio Process..

[19]  Jun Du,et al.  Speech separation of a target speaker based on deep neural networks , 2014, 2014 12th International Conference on Signal Processing (ICSP).