Video Based Person Authentication via Audio/Visual Association

Multi-modal person authentication systems can achieve higher performance and robustness by combining different modalities. The current fusion strategies of different modalities are mainly based on the output of individual modalities. However, there are detail structures between facial movement and speech signal. In this paper, audio/visual association, a lower level fusion, is proposed to fuse the information between lip movement and speech signal. The experimental results indicate that this type of fusion strategy improve the performance of multi-modal person authentication system

[1]  Tsuhan Chen,et al.  Integration strategies for audio-visual speech processing: applied to text-dependent speaker recognition , 2005, IEEE Transactions on Multimedia.

[2]  E. Mayoraz,et al.  Fusion of face and speech data for person identity verification , 1999, IEEE Trans. Neural Networks.

[3]  Stefan Fischer,et al.  Fusion of audio and video information for multi modal person authentication , 1997, Pattern Recognit. Lett..

[4]  J. Oglesby,et al.  Speaker recognition using hidden Markov models, dynamic time warping and vector quantisation , 1995 .

[5]  G.R. Doddington,et al.  Speaker recognition—Identifying people by their voices , 1985, Proceedings of the IEEE.

[6]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[7]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[8]  Samy Bengio,et al.  Face verification using adapted generative models , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[9]  G. Aguilar,et al.  Multimodal biometric system using fingerprint , 2007, 2007 International Conference on Intelligent and Advanced Systems.