论文信息 - Audio-Visual Data Fusion Using a Particle Filter in the Application of Face Recognition

Audio-Visual Data Fusion Using a Particle Filter in the Application of Face Recognition

This paper describes a methodology by which audio and visual data about a scene can be fused in a meaningful manner in order to locate a speaker in a scene. This fusion is implemented within a Particle Filter such that a single speaker can be identified in the presence of multiple visual observations. The advantages of this fusion are that weak sensory data from either modality can be reinforced and the presence of noise can be reduced.

Ayoub Al-Hamadi | Bernd Michaelis | Michael Alan Steer

[1] N. A. Abdul Rahim,et al. RGB-H-CbCr skin colour model for human face detection , 2006 .

[2] Michael Isard,et al. The CONDENSATION Algorithm - Conditional Density Propagation and Applications to Visual Tracking , 1996, NIPS.

[3] Patrick Pérez,et al. Sequential Monte Carlo Fusion of Sound and Vision for Speaker Tracking , 2001, ICCV.

[4] Andrea Cavallaro,et al. Target Detection and Tracking With Heterogeneous Sensors , 2008, IEEE Journal of Selected Topics in Signal Processing.

[5] G. Carter,et al. The generalized correlation method for estimation of time delay , 1976 .

[6] Fredrik Gustafsson,et al. Positioning using time-difference of arrival measurements , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..