On the fusion of prosody, voice spectrum and face features for multimodal person verification

Multimodal person recognition systems normally use shortterm spectral features as voice information. In this paper prosodic information is added to a system based on face and voice spectrum features. By using two fusion techniques, support vector machines and matcher weighting, different fusion strategies based on the fusion of monomodal scores in several steps are proposed. The performance of the system is clearly improved when the prosodic information is added and the best results are achieved when prosodic scores are previously fused and the resulting scores are fused again with spectral and facial scores. Speech and face scores have been obtained upon Switchboard-I and XM2VTS databases respectively. Index Terms: speaker recognition, multimodality, fusion, prosody, voice spectrum, face

[1]  Alan Mink,et al.  Multimodal Biometric Authentication Methods: A COTS Approach | NIST , 2003 .

[2]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[3]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[4]  U. Uludag,et al.  Multimodal Biometric Authentication Methods : A COTS Approach , 2003 .

[5]  Albino Nogueiras,et al.  Frequency and time filtering of filter-bank energies for HMM speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[6]  B. Atal Automatic Speaker Recognition Based on Pitch Contours , 1969 .

[7]  Juergen Luettin,et al.  Evaluation Protocol for the extended M2VTS Database (XM2VTSDB) , 1998 .

[8]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[9]  Climent Nadeu,et al.  Jacobian adaptation based on the frequency-filtered spectral energies , 2003, INTERSPEECH.

[10]  Sharath Pankanti,et al.  Guide to Biometrics , 2003, Springer Professional Computing.

[11]  Anil K. Jain,et al.  Large-scale evaluation of multimodal biometric authentication using state-of-the-art systems , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Marti A. Hearst Trends & Controversies: Support Vector Machines , 1998, IEEE Intell. Syst..

[13]  I. Pitas,et al.  Discriminant NMFfaces for Frontal Face Verification , 2005, 2005 IEEE Workshop on Machine Learning for Signal Processing.

[14]  Douglas A. Reynolds,et al.  Using prosodic and conversational features for high-performance speaker recognition: report from JHU WS'02 , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[15]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16]  Michael J. Carey,et al.  Robust prosodic features for speaker identification , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.