Robust and Fast Localization of Single Speech Source Using a Planar Array

Heavy computational load and acoustic interferences are two major problems to speech source localization in real applications. Conventional methods can mitigate one problem, but deteriorate the other. This letter proposes an algorithm of direction-of-arrival (DOA) estimation, which is both computationally efficient and robust in the presence of acoustic interferences. The robustness is considered in two aspects. One is the eigenanalysis-based enhancement to reduce acoustic interferences such as noise and reverberation. The other is the coefficients that weight the pairwise time delays to mitigate the effect of delay outliers on DOA. The high computational efficiency is achieved by making use of a concave cost function, from which, the optimal estimate of DOA is given by a closed-form solution. The grid-search method often adopted in conventional algorithms is no longer used in this algorithm. We conduct some experiments in both simulated and real environments with a 9-element circular array. The proposed algorithm runs about ten times faster than Steered Response Power PHAse Transform (SRP-PHAT), and outperforms SRP-PHAT in terms of robustness.

[1]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[2]  Michael S. Brandstein,et al.  A practical methodology for speech source localization with microphone arrays , 1997, Comput. Speech Lang..

[3]  Ying Yu,et al.  A Real-Time SRP-PHAT Source Location Implementation using Stochastic Region Contraction(SRC) on a Large-Aperture Microphone Array , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[4]  Jacob Benesty,et al.  Broadband Source Localization From an Eigenanalysis Perspective , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Zhengyou Zhang,et al.  Maximum Likelihood Sound Source Localization and Beamforming for Directional Microphone Arrays in Distributed Meetings , 2008, IEEE Transactions on Multimedia.

[6]  Jacob Benesty,et al.  A Generalized Steered Response Power Method for Computationally Viable Source Localization , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  R. O. Schmidt,et al.  Multiple emitter location and signal Parameter estimation , 1986 .

[8]  Zhengyou Zhang,et al.  Why does PHAT work well in lownoise, reverberative environments? , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Joseph H. DiBiase A High-Accuracy, Low-Latency Technique for Talker Localization in Reverberant Environments Using Microphone Arrays , 2000 .

[10]  Satoshi Nakamura,et al.  Speech enhancement based on the subspace method , 2000, IEEE Trans. Speech Audio Process..

[11]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[12]  Ramani Duraiswami,et al.  Accelerated speech source localization via a hierarchical search of steered response power , 2004, IEEE Transactions on Speech and Audio Processing.