Audio-visual person verification

In this paper we investigate benefits of classifier combination (fusion) for a multimodal system for personal identity verification. The system uses frontal face images and speech. We show that a sophisticated fusion strategy enables the system to outperform its facial and vocal modules when taken seperately. We show that both trained linear weighted schemes and fusion by Support Vector Machine classifier leads to a significant reduction of total error rates. The complete system is tested on data from a publicly available audio-visual database (XM2VTS, 295 subjects) according to a published protocol.

[1]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[2]  Roberto Brunelli,et al.  Person identification using multiple cues , 1995, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[4]  Jiri Matas,et al.  Fast face localisation and verification , 1999, Image Vis. Comput..

[5]  Juergen Luettin,et al.  Evaluation Protocol for the extended M2VTS Database (XM2VTSDB) , 1998 .

[6]  Jun Zhang,et al.  Pace recognition: eigenface, elastic matching, and neural nets , 1997, Proc. IEEE.

[7]  Aaron E. Rosenberg,et al.  Connected word talker verification using whole word hidden Markov models , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[8]  J. Kittler,et al.  Robust motion analysis , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[9]  B. Atal Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. , 1974, The Journal of the Acoustical Society of America.

[10]  Tanzeem Choudhury,et al.  Multimodal person recognition using unconstrained audio and video , 1998 .

[11]  Ivan Magrin-Chagnolleau,et al.  Second-order statistical measures for text-independent speaker identification , 1995, Speech Commun..

[12]  Souheil Ben-Yacoub Multi-Modal Data Fusion for Person Authentication using SVM , 1998 .

[13]  Juergen Luettin,et al.  Acoustic-labial speaker verification , 1997, Pattern Recognit. Lett..

[14]  Stefan Fischer,et al.  Fusion of audio and video information for multi modal person authentication , 1997, Pattern Recognit. Lett..

[15]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Josef Kittler,et al.  A weighted combination of classifiers employing shared and distinct representations , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).