Localization and tracking for simultaneous speakers based on time-frequency method and Probability Hypothesis Density filter

In this paper we present the two steps system of localization and tracking to work in context of simultaneous speakers. The localization algorithm is based on time-frequency method which uses an array of three microphones and it enables to locate multiple sound sources in a single time-frame. Localization results with missing detection and clutter are post-processed by the Probability Hypothesis Density (PHD) filter — based tracking algorithm to estimate the smoothed trajectory of each speaker. The experiments carried out on real data recording show that our method outperforms the multi-target particle filter (MTPF) — based algorithm and is effective in practical application of human-robot interaction.

[1]  Ba-Ngu Vo,et al.  On performance evaluation of multi-object filters , 2008, 2008 11th International Conference on Information Fusion.

[2]  Ba-Ngu Vo,et al.  The Gaussian Mixture Probability Hypothesis Density Filter , 2006, IEEE Transactions on Signal Processing.

[3]  Timothy J. Robinson,et al.  Sequential Monte Carlo Methods in Practice , 2003 .

[4]  J. L. Roux An Introduction to the Kalman Filter , 2003 .

[5]  Y. Boers,et al.  Multitarget particle filter track before detect application , 2004 .

[6]  Nozomu Hamada,et al.  Multiple-speech-source localization using advanced histogram mapping method , 2009 .

[7]  Stefan Wermter,et al.  Robotic sound-source localisation architecture using cross-correlation and recurrent neural networks , 2009, Neural Networks.

[8]  Ba-Ngu Vo,et al.  A Consistent Metric for Performance Evaluation of Multi-Object Filters , 2008, IEEE Transactions on Signal Processing.

[9]  Simon J. Godsill,et al.  Acoustic Source Localization and Tracking Using Track Before Detect , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Hiroshi Sawada,et al.  Doa Estimation for Multiple Sparse Sources with Normalized Observation Vector Clustering , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[11]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[12]  Xiaochun Lu,et al.  Pattern recognition based Kalman filter for indoor localization using TDOA algorithm , 2010 .

[13]  Jean Rouat,et al.  Robust localization and tracking of simultaneous moving sound sources using beamforming and particle filtering , 2007, Robotics Auton. Syst..

[14]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[15]  Boaz Rafaely,et al.  Microphone Array Signal Processing , 2008 .

[16]  JongSuk Choi,et al.  Audio-visual data fusion for tracking the direction of multiple speakers , 2010, ICCAS 2010.

[17]  Daniel P. W. Ellis,et al.  Model-Based Expectation-Maximization Source Separation and Localization , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  R. Mahler Multitarget Bayes filtering via first-order multitarget moments , 2003 .

[19]  S. Godsill,et al.  Monte Carlo filtering for multi target tracking and data association , 2005, IEEE Transactions on Aerospace and Electronic Systems.

[20]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .