Speaker localization and tracking in the presence of sound interference by exploiting speech harmonicity

The performance of conventional acoustic source localization and tracking system reduces significantly when reverberation, noise, and acoustic interference are present. In this paper, a robust speaker tracking algorithm for an enclosed environment in the presence of interference and noise is proposed. We exploit the harmonic structure which is a distinctive feature in speech to enhance the robustness against acoustic interference. In order to extract the speech harmonic information, a beamformer is employed to enhance the signal from a prior estimated source location. A new particle weight update is then computed based on the steered response power function given the estimated speech harmonic information. Simulation results show that the proposed method achieves robustness in localization and tracking of a speech source in the presence of interference, noise and reverberation.

[1]  Jae S. Lim,et al.  Multiband excitation vocoder , 1988, IEEE Transactions on Acoustics, Speech, and Signal Processing.

[2]  E. Lehmann,et al.  Prediction of energy decay in room impulse responses simulated with an image-source model. , 2008, The Journal of the Acoustical Society of America.

[3]  Neil J. Gordon,et al.  A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking , 2002, IEEE Trans. Signal Process..

[4]  Emanuel A. P. Habets,et al.  A Speech Distortion and Interference Rejection Constraint Beamformer , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Jacob Benesty,et al.  A Generalized Steered Response Power Method for Computationally Viable Source Localization , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  R. O. Schmidt,et al.  Multiple emitter location and signal Parameter estimation , 1986 .

[7]  Jacob Benesty,et al.  Direction of Arrival Estimation Using the Parameterized Spatial Correlation Matrix , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Pravin Varaiya,et al.  Adaptive Acoustic Beamformer With Source Tracking Capabilities , 2008, IEEE Transactions on Signal Processing.

[9]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[10]  Emanuel A. P. Habets,et al.  Multiple-Hypothesis Extended Particle Filter for Acoustic Source Localization in Reverberant Environments , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Eric A. Lehmann,et al.  Particle Filter with Integrated Voice Activity Detection for Acoustic Source Tracking , 2007, EURASIP J. Adv. Signal Process..

[12]  N. Gordon,et al.  Novel approach to nonlinear/non-Gaussian Bayesian state estimation , 1993 .

[13]  Simon J. Godsill,et al.  Acoustic Source Localization and Tracking of a Time-Varying Number of Speakers , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Jacob Benesty,et al.  Real-time passive source localization: a practical linear-correction least-squares approach , 2001, IEEE Trans. Speech Audio Process..

[15]  Andrew Blake,et al.  Nonlinear filtering for speaker tracking in noisy and reverberant environments , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[16]  Don H. Johnson,et al.  Array Signal Processing: Concepts and Techniques , 1993 .

[17]  Marc Moonen,et al.  Joint DOA and multi-pitch estimation based on subspace techniques , 2012, EURASIP J. Adv. Signal Process..

[18]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[19]  Jacob Benesty,et al.  Time Delay Estimation in Room Acoustic Environments: An Overview , 2006, EURASIP J. Adv. Signal Process..

[20]  M S Brandstein Time-delay estimation of reverberated speech exploiting harmonic structure. , 1999, The Journal of the Acoustical Society of America.

[21]  M. Kepesi,et al.  Joint Position-Pitch Estimation for Multiple Speaker Scenarios , 2008, 2008 Hands-Free Speech Communication and Microphone Arrays.

[22]  Simon J. Godsill,et al.  Acoustic Source Localization and Tracking Using Track Before Detect , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  Fotios Talantzis An Acoustic Source Localization and Tracking Framework Using Particle Filtering and Information Theory , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[24]  Darren B. Ward,et al.  Particle filtering algorithms for tracking an acoustic source in a reverberant environment , 2003, IEEE Trans. Speech Audio Process..

[25]  Yiteng Huang Immersive audio schemes , 2011, IEEE Signal Processing Magazine.

[26]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..