A Robust Dual-Microphone Speech Source Localization Algorithm for Reverberant Environments

Speech source localization (SSL) using a microphone array aims to estimate the direction-of-arrival (DOA) of the speech source. However, its performance often degrades rapidly in reverberant environments. In this paper, a novel dual-microphone SSL algorithm is proposed to address this problem. First, the time-frequency regions dominated by direct sound are extracted by tracking the envelopes of speech, reverberation and background noise. The time-difference-of-arrival (TDOA) is then estimated by considering only these reliable regions. Second, a bin-wise de-aliasing strategy is introduced to make better use of the DOA information carried at high frequencies, where the spatial resolution is higher and there is typically less corruption by diffuse noise. Our experiments show that when compared with other widely-used algorithms, the proposed algorithm produces more reliable performance in realistic reverberant environments.

[1]  Dinh-Tuan Pham,et al.  A phase-based dual microphone method to count and locate audio sources in reverberant rooms , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[2]  Ying Zhang,et al.  DOA estimation of wideband sources without estimating the number of sources , 2012, Signal Process..

[3]  Heping Ding,et al.  A Region-Growing Permutation Alignment Approach in Frequency-Domain Blind Source Separation of Speech Mixtures , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Raffaele Parisi,et al.  Sound mapping in reverberant rooms by a robust direct method , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Benesty,et al.  Adaptive eigenvalue decomposition algorithm for passive acoustic source localization , 2000, The Journal of the Acoustical Society of America.

[6]  M S Brandstein Time-delay estimation of reverberated speech exploiting harmonic structure. , 1999, The Journal of the Acoustical Society of America.

[7]  Emmanuel Vincent,et al.  Multi-source TDOA estimation in reverberant audio using angular spectra and clustering , 2012, Signal Process..

[8]  Jacob Benesty,et al.  Time-delay estimation via linear interpolation and cross correlation , 2004, IEEE Transactions on Speech and Audio Processing.

[9]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[10]  Walter Kellermann,et al.  TDOA Estimation for Multiple Sound Sources in Noisy and Reverberant Environments Using Broadband Independent Component Analysis , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Maurizio Omologo,et al.  Use of the crosspower-spectrum phase in acoustic event location , 1997, IEEE Trans. Speech Audio Process..

[12]  Rhee Man Kil,et al.  Estimation of Interaural Time Differences Based on Zero-Crossings in Noisy Multisource Environments , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Joseph H. DiBiase A High-Accuracy, Low-Latency Technique for Talker Localization in Reverberant Environments Using Microphone Arrays , 2000 .

[14]  Henry G. Dietz,et al.  Performance of phase transform for detecting sound sources with microphone arrays in reverberant and noisy environments , 2007, Signal Process..

[15]  Mohan M. Trivedi,et al.  Source localization in reverberant environments: modeling and statistical analysis , 2003, IEEE Trans. Speech Audio Process..

[16]  S. R. Mahadeva Prasanna,et al.  Speaker localization using excitation source information in speech , 2005, IEEE Transactions on Speech and Audio Processing.

[17]  Hiroshi Sawada,et al.  Solving the Permutation Problem of Frequency-Domain BSS when Spatial Aliasing Occurs with Wide Sensor Spacing , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[18]  I. Cohen,et al.  Multichannel signal detection based on the transient beam-to-reference ratio , 2003, IEEE Signal Processing Letters.

[19]  Ken Yamazaki,et al.  Computational acoustic vision by solving phase ambiguity confusion , 2009 .

[20]  S. R. Mahadeva Prasanna,et al.  Processing of reverberant speech for time-delay estimation , 2005, IEEE Transactions on Speech and Audio Processing.

[21]  Hiroshi Ishiguro,et al.  Evaluation of a MUSIC-based real-time sound localization of multiple sound sources in real noisy environments , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[22]  Michele Scarpiniti,et al.  Cepstrum Prefiltering for Binaural Source Localization in Reverberant Environments , 2012, IEEE Signal Processing Letters.

[23]  Bhaskar D. Rao,et al.  A Two Microphone-Based Approach for Source Localization of Multiple Speech Sources , 2010, IEEE Transactions on Audio, Speech, and Language Processing.