Towards robust person recognition on handheld devices using face and speaker identification technologies

Most face and speaker identification techniques are tested on data collected in controlled environments using high quality cameras and microphones. However, the use of these technologies in variable environments and with the help of the inexpensive sound and image capture hardware present in mobile devices presents an additional challenge. In this study, we investigate the application of existing face and speaker identification techniques to a person identification task on a handheld device. These techniques have proven to perform accurately on tightly constrained experiments where the lighting conditions, visual backgrounds, and audio environments are fixed and specifically adjusted for optimal data quality. When these techniques are applied on mobile devices where the visual and audio conditions are highly variable, degradations in performance can be expected. Under these circumstances, the combination of multiple biometric modalities can improve the robustness and accuracy of the person identification task. In this paper, we present our approach for combining face and speaker identification technologies and experimentally demonstrate a fused multi-biometric system which achieves a 50% reduction in equal error rate over the better of the two independent systems.

[1]  Anant Agarwal,et al.  Handheld Face Identification Technology in a Pervasive Computing Environment , 2002 .

[2]  Joseph P. Campbell,et al.  Testing with the YOHO CD-ROM voice verification corpus , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[3]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[4]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[5]  Alex Park,et al.  ASR dependent techniques for speaker identification , 2002, INTERSPEECH.

[6]  Jiri Matas,et al.  Combining Evidence in Multimodal Personal Identity Recognition Systems , 1997, AVBPA.

[7]  Tomaso A. Poggio,et al.  Face recognition with support vector machines: global versus component-based approach , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[8]  Norman Poh,et al.  Hybrid Biometric Person Authentication Using Face and Voice Features , 2001, AVBPA.

[9]  Purdy Ho,et al.  Rotation Invariant Real-time Face Detection and Recognition System , 2001 .