论文信息 - Multichannel and Multimodality Person Identification

Multichannel and Multimodality Person Identification

Person's identity is a very important high level information for video analysis and retrieval. Along the growth of multimedia data, the recording is not only multimodality and also multichannel(microphone array, camera array). In this paper, we describe a multimodal person identification system of UIUC team for CLEAR 2007 evaluation. The audio only system is based on a new proposed model --- Chain of Gaussian Mixtures. The visual only system is a face recognition module based on nearest neighbor classifier at appearance space. Final system fuses 7 channel microphone recordings and 4 camera recordings at decision level. The experimental results indicate the effectiviness of speaker modeling methods and the fusion scheme.

[1] Juergen Luettin,et al. Audio-Visual Speech Modelling for Continuous Speech Recognition , 2000 .

[2] Chalapathy Neti,et al. Frame-dependent multi-stream reliability indicators for audio-visual speech recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[3] Douglas A. Reynolds,et al. Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[4] Sadaoki Furui,et al. An Overview of Speaker Recognition Technology , 1996 .

[5] G.R. Doddington,et al. Speaker recognition—Identifying people by their voices , 1985, Proceedings of the IEEE.

[6] Azriel Rosenfeld,et al. Face recognition: A literature survey , 2003, CSUR.

[7] Douglas A. Reynolds,et al. Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..

[8] Juergen Luettin,et al. Audio-Visual Speech Modeling for Continuous Speech Recognition , 2000, IEEE Trans. Multim..