Audio-Visual Data Fusion Using a Particle Filter in the Application of Face Recognition

This paper describes a methodology by which audio and visual data about a scene can be fused in a meaningful manner in order to locate a speaker in a scene. This fusion is implemented within a Particle Filter such that a single speaker can be identified in the presence of multiple visual observations. The advantages of this fusion are that weak sensory data from either modality can be reinforced and the presence of noise can be reduced.