论文信息 - An Application of a Particle Filter to Bayesian Multiple Sound Source Tracking with Audio and Video Information Fusion

An Application of a Particle Filter to Bayesian Multiple Sound Source Tracking with Audio and Video Information Fusion

Abstract – A particle filter is applied to the problem of detecting and tracking multiple sound sources by Bayesian inference using combined audio and video information. The problem is formulated within a general framework of Bayesian hidden variable sequence estimation by fusing observed information. The particle filter is then introduced as an approximation of Bayesian inference. Experiments using real-world data demonstrate that the proposed method works well in ordinary environments such as a meeting room. The computational cost of estimation is reduced significantly compared to exact Bayesian inference, while maintaining the quality of estimation.

[1] Lawrence D. Stone,et al. Bayesian Multiple Target Tracking , 1999 .

[2] Satoshi Nakamura,et al. DETECTION OF SPEECH EVENTS IN REAL ENVIRONMENTS THROUGH FUSION OF AUDIO AND VIDEO INFORMATION USING BAYESIAN NETWORKS , 2003 .

[3] R. O. Schmidt,et al. Multiple emitter location and signal Parameter estimation , 1986 .

[4] Satoshi Nakamura,et al. Detection and separation of speech segment using audio and video information fusion , 2003, INTERSPEECH.

[5] Nando de Freitas,et al. Rao-Blackwellised Particle Filtering for Dynamic Bayesian Networks , 2000, UAI.

[6] Michael Isard,et al. CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.

[7] Andrew Blake,et al. Nonlinear filtering for speaker tracking in noisy and reverberant environments , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[8] Timothy J. Robinson,et al. Sequential Monte Carlo Methods in Practice , 2003 .

[9] Naoyuki Ichimura,et al. Stochastic filtering for motion trajectory in image sequences using a Monte Carlo filter with estimation of hyper-parameters , 2002, Object recognition supported by user interaction for service robots.

[10] Simon J. Godsill,et al. Monte Carlo smoothing with application to audio signal enhancement , 2002, IEEE Trans. Signal Process..

[11] Wolfram Burgard,et al. Particle Filters for Mobile Robot Localization , 2001, Sequential Monte Carlo Methods in Practice.

[12] Futoshi Asano,et al. Fusion of audio and video information for detecting speech events , 2003, Sixth International Conference of Information Fusion, 2003. Proceedings of the.

[13] Nando de Freitas,et al. An Introduction to Sequential Monte Carlo Methods , 2001, Sequential Monte Carlo Methods in Practice.

[14] Michael I. Miller,et al. Maximum-likelihood narrow-band direction finding and the EM algorithm , 1990, IEEE Trans. Acoust. Speech Signal Process..

[15] Editors , 1986, Brain Research Bulletin.

[16] Simon J. Godsill,et al. Particle methods for Bayesian modeling and enhancement of speech signals , 2002, IEEE Trans. Speech Audio Process..

[17] Petros G. Voulgaris,et al. On optimal ℓ∞ to ℓ∞ filtering , 1995, Autom..

[18] G. Kitagawa. Monte Carlo Filter and Smoother for Non-Gaussian Nonlinear State Space Models , 1996 .

[19] Satoshi Nakamura,et al. Simultaneous recognition of multiple sound sources based on 3-d n-best search using microphone array , 1999, EUROSPEECH.

[20] Darren B. Ward,et al. Particle filtering algorithms for tracking an acoustic source in a reverberant environment , 2003, IEEE Trans. Speech Audio Process..