Localization of Multiple Speakers under High Reverberation using a Spherical Microphone Array and the Direct-Path Dominance Test

One of the major challenges encountered when localizing multiple speakers in real world environments is the need to overcome the effect of multipath distortion due to room reverberation. A wide range of methods has been proposed for speaker localization, many based on microphone array processing. Some of these methods are designed for the localization of coherent sources, typical of multipath environments, and some have even reported limited robustness to reverberation. Nevertheless, speaker localization under conditions of high reverberation still remains a challenging task. This paper proposes a novel multiple-speaker localization technique suitable for environments with high reverberation, based on a spherical microphone array and processing in the spherical harmonics (SH) domain. The non-stationarity and sparsity of speech, as well as frequency smoothing in the SH domain, are exploited in the development of a direct-path dominance test. This test can identify time-frequency (TF) bins that contain contributions from only one significant source and no significant contribution from room reflections, such that localization based on these selected TF-bins is performed accurately, avoiding the potential distortion due to other sources and reverberation. Computer simulations and an experiment in a real reverberant room validate the robustness of the proposed method in the presence of high reverberation .

[1]  Boaz Rafaely,et al.  Spatial Aliasing in Spherical Microphone Arrays , 2007, IEEE Transactions on Signal Processing.

[2]  B.D. Van Veen,et al.  Beamforming: a versatile approach to spatial filtering , 1988, IEEE ASSP Magazine.

[3]  Israel Cohen,et al.  On Multiplicative Transfer Function Approximation in the Short-Time Fourier Transform Domain , 2007, IEEE Signal Processing Letters.

[4]  David G. Stork,et al.  Pattern Classification , 1973 .

[5]  Boaz Rafaely,et al.  Analysis and design of spherical microphone arrays , 2005, IEEE Transactions on Speech and Audio Processing.

[6]  Harry L. Van Trees,et al.  Optimum Array Processing: Part IV of Detection, Estimation, and Modulation Theory , 2002 .

[7]  S. Rickard,et al.  DOA estimation of many W-disjoint orthogonal sources from two mixtures using DUET , 2000, Proceedings of the Tenth IEEE Workshop on Statistical Signal and Array Processing (Cat. No.00TH8496).

[8]  Boaz Rafaely,et al.  Coherent signals direction-of-arrival estimation using a spherical microphone array: Frequency smoothing approach , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[9]  Boaz Rafaely,et al.  Linearly-Constrained Minimum-Variance Method for Spherical Microphone Arrays Based on Plane-Wave Decomposition of the Sound Field , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Walter Kellermann,et al.  Robust localization of multiple sources in reverberant environments using EB-ESPRIT with spherical microphone arrays , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Thomas Kailath,et al.  ESPRIT-estimation of signal parameters via rotational invariance techniques , 1989, IEEE Trans. Acoust. Speech Signal Process..

[12]  Hong Wang,et al.  Coherent signal-subspace processing for the detection and estimation of angles of arrival of multiple wide-band sources , 1985, IEEE Trans. Acoust. Speech Signal Process..

[13]  R. O. Schmidt,et al.  Multiple emitter location and signal Parameter estimation , 1986 .

[14]  Özgür Yilmaz,et al.  On the approximate W-disjoint orthogonality of speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  C. Faller,et al.  Source localization in complex listening situations: selection of binaural cues based on interaural coherence. , 2004, The Journal of the Acoustical Society of America.

[16]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[17]  Shoko Araki,et al.  PERFORMANCE EVALUATION OF SPARSE SOURCE SEPARATION AND DOA ESTIMATION WITH OBSERVATION VECTOR CLUSTERING IN REVERBERANT ENVIRONMENTS , 2006 .

[18]  Wen-Jun Zeng,et al.  High-Resolution Multiple Wideband and Nonstationary Source Localization With Unknown Number of Sources , 2010, IEEE Transactions on Signal Processing.

[19]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[20]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[21]  Douglas L. Jones,et al.  Localization of multiple acoustic sources with small arrays using a coherence test. , 2008, The Journal of the Acoustical Society of America.

[22]  Maximo Cobos,et al.  Robust acoustic source localization based on modal beamforming and time-frequency processing using circular microphone arrays. , 2012, The Journal of the Acoustical Society of America.

[23]  Bruce C. Wheeler,et al.  Localization of nonstationary sources using a coherence test , 2003, IEEE Workshop on Statistical Signal Processing, 2003.

[24]  Boaz Rafaely,et al.  Open-Sphere Designs for Spherical Microphone Arrays , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  Hendrik Rogier,et al.  Closed-form 2D angle estimation with a spherical array via spherical phase mode excitation and esprit , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[26]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[27]  F. Hlawatsch,et al.  Linear and quadratic time-frequency signal representations , 1992, IEEE Signal Processing Magazine.

[28]  B. Rafaely Plane-wave decomposition of the sound field on a sphere by spherical convolution , 2004 .