An Application of a Particle Filter to Bayesian Multiple Sound Source Tracking with Audio and Video Information Fusion

Abstract – A particle filter is applied to the problem of detecting and tracking multiple sound sources by Bayesian inference using combined audio and video information. The problem is formulated within a general framework of Bayesian hidden variable sequence estimation by fusing observed information. The particle filter is then introduced as an approximation of Bayesian inference. Experiments using real-world data demonstrate that the proposed method works well in ordinary environments such as a meeting room. The computational cost of estimation is reduced significantly compared to exact Bayesian inference, while maintaining the quality of estimation.

[1]  Lawrence D. Stone,et al.  Bayesian Multiple Target Tracking , 1999 .

[2]  Satoshi Nakamura,et al.  DETECTION OF SPEECH EVENTS IN REAL ENVIRONMENTS THROUGH FUSION OF AUDIO AND VIDEO INFORMATION USING BAYESIAN NETWORKS , 2003 .

[3]  R. O. Schmidt,et al.  Multiple emitter location and signal Parameter estimation , 1986 .

[4]  Satoshi Nakamura,et al.  Detection and separation of speech segment using audio and video information fusion , 2003, INTERSPEECH.

[5]  Nando de Freitas,et al.  Rao-Blackwellised Particle Filtering for Dynamic Bayesian Networks , 2000, UAI.

[6]  Michael Isard,et al.  CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.

[7]  Andrew Blake,et al.  Nonlinear filtering for speaker tracking in noisy and reverberant environments , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[8]  Timothy J. Robinson,et al.  Sequential Monte Carlo Methods in Practice , 2003 .

[9]  Naoyuki Ichimura,et al.  Stochastic filtering for motion trajectory in image sequences using a Monte Carlo filter with estimation of hyper-parameters , 2002, Object recognition supported by user interaction for service robots.

[10]  Simon J. Godsill,et al.  Monte Carlo smoothing with application to audio signal enhancement , 2002, IEEE Trans. Signal Process..

[11]  Wolfram Burgard,et al.  Particle Filters for Mobile Robot Localization , 2001, Sequential Monte Carlo Methods in Practice.

[12]  Futoshi Asano,et al.  Fusion of audio and video information for detecting speech events , 2003, Sixth International Conference of Information Fusion, 2003. Proceedings of the.

[13]  Nando de Freitas,et al.  An Introduction to Sequential Monte Carlo Methods , 2001, Sequential Monte Carlo Methods in Practice.

[14]  Michael I. Miller,et al.  Maximum-likelihood narrow-band direction finding and the EM algorithm , 1990, IEEE Trans. Acoust. Speech Signal Process..

[15]  Editors , 1986, Brain Research Bulletin.

[16]  Simon J. Godsill,et al.  Particle methods for Bayesian modeling and enhancement of speech signals , 2002, IEEE Trans. Speech Audio Process..

[17]  Petros G. Voulgaris,et al.  On optimal ℓ∞ to ℓ∞ filtering , 1995, Autom..

[18]  G. Kitagawa Monte Carlo Filter and Smoother for Non-Gaussian Nonlinear State Space Models , 1996 .

[19]  Satoshi Nakamura,et al.  Simultaneous recognition of multiple sound sources based on 3-d n-best search using microphone array , 1999, EUROSPEECH.

[20]  Darren B. Ward,et al.  Particle filtering algorithms for tracking an acoustic source in a reverberant environment , 2003, IEEE Trans. Speech Audio Process..