A learning-based approach to direction of arrival estimation in noisy and reverberant environments

This paper presents a learning-based approach to the task of direction of arrival estimation (DOA) from microphone array input. Traditional signal processing methods such as the classic least square (LS) method rely on strong assumptions on signal models and accurate estimations of time delay of arrival (TDOA) . They only work well in relatively clean conditions, but suffer from noise and reverberation distortions. In this paper, we propose a learning-based approach that can learn from a large amount of simulated noisy and reverberant microphone array inputs for robust DOA estimation. Specifically, we extract features from the generalised cross correlation (GCC) vectors and use a multilayer perceptron neural network to learn the nonlinear mapping from such features to the DOA. One advantage of the learning based method is that as more and more training data becomes available, the DOA estimation will become more and more accurate. Experimental results on simulated data show that the proposed learning based method produces much better results than the state-of-the-art LS method. The testing results on real data recorded in meeting rooms show improved root-mean-square error (RMSE) compared to the LS method.

[1]  Petre Stoica,et al.  Maximum likelihood methods for direction-of-arrival estimation , 1990, IEEE Trans. Acoust. Speech Signal Process..

[2]  Joseph H. DiBiase A High-Accuracy, Low-Latency Technique for Talker Localization in Reverberant Environments Using Microphone Arrays , 2000 .

[3]  Yeong-Taeg Kim,et al.  Contrast enhancement using brightness preserving bi-histogram equalization , 1997 .

[4]  Thomas Kailath,et al.  ESPRIT-estimation of signal parameters via rotational invariance techniques , 1989, IEEE Trans. Acoust. Speech Signal Process..

[5]  Jacob Benesty,et al.  Broadband Source Localization From an Eigenanalysis Perspective , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Jacob Benesty,et al.  Time-delay estimation via linear interpolation and cross correlation , 2004, IEEE Transactions on Speech and Audio Processing.

[7]  Shengkui Zhao,et al.  A real-time 3D sound localization system with miniature microphone array for virtual reality , 2012, 2012 7th IEEE Conference on Industrial Electronics and Applications (ICIEA).

[8]  Benesty Adaptive eigenvalue decomposition algorithm for passive acoustic source localization , 2000, The Journal of the Acoustical Society of America.

[9]  Shengkui Zhao,et al.  Underdetermined direction of arrival estimation using acoustic vector sensor , 2014, Signal Process..

[10]  Steve Renals,et al.  WSJCAMO: a British English speech corpus for large vocabulary continuous speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[11]  Bo Wang,et al.  Mixed Sources Localization Based on Sparse Signal Reconstruction , 2012, IEEE Signal Processing Letters.

[12]  Jie Huang,et al.  Sound localization in reverberant environment based on the model of the precedence effect , 1997 .

[13]  Shengkui Zhao,et al.  Robust DOA estimation of multiple speech sources , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Simon Haykin,et al.  Neural Networks and Learning Machines , 2010 .

[15]  Dmitry M. Malioutov,et al.  A sparse signal reconstruction perspective for source localization with sensor arrays , 2005, IEEE Transactions on Signal Processing.

[16]  Walter Kellermann,et al.  TDOA Estimation for Multiple Sound Sources in Noisy and Reverberant Environments Using Broadband Independent Component Analysis , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  R. O. Schmidt,et al.  Multiple emitter location and signal Parameter estimation , 1986 .

[18]  Haizhou Li,et al.  Attribute-based histogram equalization (HEQ) and its adaptation for robust speech recognition , 2013, INTERSPEECH.

[19]  José L. Pérez-Córdoba,et al.  Histogram equalization of speech representation for robust speech recognition , 2005, IEEE Transactions on Speech and Audio Processing.

[20]  Jacob Benesty,et al.  Real-time passive source localization: a practical linear-correction least-squares approach , 2001, IEEE Trans. Speech Audio Process..

[21]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[22]  Haizhou Li,et al.  Maximum likelihood adaptation of histogram equalization with constraint for robust speech recognition , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  Douglas L. Jones,et al.  THE NTU-ADSC SYSTEMS FOR REVERBERATION CHALLENGE 2014 , 2014 .

[24]  Tomohiro Nakatani,et al.  The reverb challenge: A common evaluation framework for dereverberation and recognition of reverberant speech , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[25]  John McDonough,et al.  Distant Speech Recognition , 2009 .

[26]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[27]  Haizhou Li,et al.  A Robust Real-Time Sound Source Localization System for Olivia Robot , 2010 .