Person Identification Based on Multichannel and Multimodality Fusion

Person ID is a very useful information for high level video analysis and retrieval. In some scenario, the recording is not only multimodality and also multichannel (microphone array, camera array). In this paper, we describe a Multimodal person ID system base on multichannel and multimodal fusion. The audio only system is combining 7 channel microphone recording at decision output individual audio-only system. The modeling technique of audio system is Universal Background Model (UBM) and Maximum a Posterior adaptation framework which is very popular in speaker recognition literature. The visual only system works directly on the appearance space via l1 norm and nearest neighbor classifier. The linear fusion is then combining the two modalities to improve the ID performance. The experiments indicate the effectiviness of micropohone array fusion and audio/visual fusion.