Time-of-arrival estimation for blind beamforming

Ad-hoc arrays formed by mobile devices are increasingly available to capture audio and video in social events. Using spatial signal processing algorithms, e.g., beamforming, with microphone signals of such arrays is hindered by the unknown locations of the devices and the lack of temporal synchronization between them. While self-calibration methods can be applied to estimate these missing parameters, they typically impose restrictions and require time to converge. Time difference of arrival (TDOA) values contain source related spatial information, and they have been previously used in source localization and tracking. In this work, relative time-of-arrival (TOA) is proposed to be used for estimating source spatial information. The method is then applied for beamforming using ad-hoc arrays. Simulations and measurements with smartphones are used to test the accuracy of different proposed TOA estimators. Then, speech captured by a smartphone array is beamformed using the TOA estimators. Results show that Kalman filter based TOA steering achieves similar enhancement performance as using the ground truth TOA.

[1]  Matti S. Hämäläinen,et al.  Passive self-localization of microphones using ambient sounds , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[2]  Jouko Lampinen,et al.  Rao-Blackwellized particle filter for multiple target tracking , 2007, Inf. Fusion.

[3]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Ba-Ngu Vo,et al.  Tracking an unknown time-varying number of speakers using TDOA measurements: a random finite set approach , 2006, IEEE Transactions on Signal Processing.

[5]  Ivan Himawan,et al.  Microphone Array Shape Calibration in Diffuse Noise Fields , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Sergiy A. Vorobyov,et al.  Robust Adaptive Beamforming Based on Steering Vector Estimation With as Little as Possible Prior Information , 2012, IEEE Transactions on Signal Processing.

[7]  Francesco Nesta,et al.  Tracking of multidimensional TDOA for multiple sources with distributed microphone pairs , 2013, Comput. Speech Lang..

[8]  D. Simon Optimal State Estimation: Kalman, H Infinity, and Nonlinear Approaches , 2006 .

[9]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[10]  Petr Pollák,et al.  Methods for Speech SNR Estimation: Evaluation Tool and Analysis of VAD Dependency , 2005 .

[11]  Nobutaka Ito,et al.  Blind alignment of asynchronously recorded signals for distributed microphone array , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[12]  Francesco Nesta,et al.  Blind source extraction for robust speech recognition in multisource noisy environments , 2013, Comput. Speech Lang..

[13]  Zhengyou Zhang,et al.  Why does PHAT work well in lownoise, reverberative environments? , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Anthony J. Weiss,et al.  "Almost blind" steering vector estimation using second-order moments , 1996, IEEE Trans. Signal Process..

[15]  Dan Simon,et al.  Optimal State Estimation: Kalman, H∞, and Nonlinear Approaches , 2006 .

[16]  Jouko Lampinen,et al.  Rao-Blackwellized Monte Carlo Data Association for Multiple Target Tracking , 2004 .