Frequency-Sliding Generalized Cross-Correlation: A Sub-Band Time Delay Estimation Approach

The generalized cross-correlation(GCC) is regarded as the most popular approach for estimating the time difference of arrival (TDOA) between the signals received at two sensors. Time delay estimates are obtained by maximizing the GCC output, where the direct-path delay is usually observed as a prominent peak. Moreover, GCCs play also an important role in steered response power (SRP) localization algorithms, where the SRP functional can be written as an accumulation of the GCCs computed from multiple sensor pairs. Unfortunately, the accuracy of TDOA estimates is affected by multiple factors, including noise, reverberation and signal bandwidth. In this paper, a sub-band approach for time delay estimation aimed at improving the performance of the conventional GCC is presented. The proposed method is based on the extraction of multiple GCCs corresponding to different frequency bands of the cross-power spectrum phase in a sliding-window fashion. The major contributions of this paper include: 1) a sub-band GCC representation of the cross-power spectrum phase that, despite having a reduced temporal resolution, provides a more suitable representation for estimating the true TDOA; 2) such matrix representation is shown to be rank one in the ideal noiseless case, a property that is exploited in more adverse scenarios to obtain a more robust and accurate GCC; 3) we propose a set of low-rank approximation alternatives for processing the sub-band GCC matrix, leading to better TDOA estimates and source localization performance. An extensive set of experiments is presented to demonstrate the validity of the proposed approach.

[1]  Benoît Champagne,et al.  Performance of time-delay estimation in the presence of room reverberation , 1996, IEEE Trans. Speech Audio Process..

[2]  Jacob Benesty,et al.  Time Delay Estimation in Room Acoustic Environments: An Overview , 2006, EURASIP J. Adv. Signal Process..

[3]  F. Sgard,et al.  On the use of modified phase transform weighting functions for acoustic imaging with the generalized cross correlation. , 2019, The Journal of the Acoustical Society of America.

[4]  José M. F. Moura,et al.  Factorization as a rank 1 problem , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[5]  Zhengyou Zhang,et al.  Why does PHAT work well in lownoise, reverberative environments? , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  José Escolano,et al.  Evaluation of generalized cross-correlation methods for direction of arrival estimation using two microphones in real environments , 2012 .

[7]  Mohan M. Trivedi,et al.  Source localization in reverberant environments: modeling and statistical analysis , 2003, IEEE Trans. Speech Audio Process..

[8]  Maximo Cobos,et al.  On the performance of multi-GPU-based expert systems for acoustic localization involving massive microphone arrays , 2015, Expert Syst. Appl..

[9]  Maximo Cobos,et al.  A Modified SRP-PHAT Functional for Robust Real-Time Sound Source Localization With Scalable Spatial Sampling , 2011, IEEE Signal Processing Letters.

[10]  Athanasios Mouchtaris,et al.  A Survey of Sound Source Localization Methods in Wireless Acoustic Sensor Networks , 2017, Wirel. Commun. Mob. Comput..

[11]  Maximo Cobos,et al.  Real time speaker localization and detection system for camera steering in multiparticipant videoconferencing environments , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Jacob Benesty,et al.  Time-delay estimation via linear interpolation and cross correlation , 2004, IEEE Transactions on Speech and Audio Processing.

[13]  Sharon Gannot,et al.  Time difference of arrival estimation of speech source in a noisy and reverberant environment , 2005, Signal Process..

[14]  Benesty,et al.  Adaptive eigenvalue decomposition algorithm for passive acoustic source localization , 2000, The Journal of the Acoustical Society of America.

[15]  Marc Moonen,et al.  Robust Adaptive Time Delay Estimation for Speaker Localization in Noisy and Reverberant Acoustic Environments , 2003, EURASIP J. Adv. Signal Process..

[16]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[17]  Jacob Benesty,et al.  Robust time delay estimation exploiting redundancy among multiple microphones , 2003, IEEE Trans. Speech Audio Process..

[18]  P. Welch The use of fast Fourier transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms , 1967 .

[19]  Patrick A. Naylor,et al.  The LOCATA Challenge: Acoustic Source Localization and Tracking , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[20]  M. S. Brandstein A pitch-based approach to time-delay estimation of reverberant speech , 1997, Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics.

[21]  Radu Horaud,et al.  Geometrically-constrained robust time delay estimation using non-coplanar microphone arrays , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[22]  A. Weiss,et al.  Fundamental limitations in passive time delay estimation--Part I: Narrow-band systems , 1983 .

[23]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[24]  Tommi S. Jaakkola,et al.  Weighted Low-Rank Approximations , 2003, ICML.

[25]  J. Ianniello,et al.  Time delay estimation via cross-correlation in the presence of large estimation errors , 1982 .

[26]  Andrew Blake,et al.  Nonlinear filtering for speaker tracking in noisy and reverberant environments , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[27]  G. Carter Coherence and time delay estimation , 1987, Proceedings of the IEEE.

[28]  Maximo Cobos,et al.  Steered Response Power Localization of Acoustic Passband Signals , 2017, IEEE Signal Processing Letters.

[29]  S. R. Mahadeva Prasanna,et al.  Processing of reverberant speech for time-delay estimation , 2005, IEEE Transactions on Speech and Audio Processing.

[30]  V. G. Reju,et al.  Swarm Intelligence Based Particle Filter for Alternating Talker Localization and Tracking Using Microphone Arrays , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.