Audio-Visual Authentication System over the Internet Protocol

In this paper, an audio-visual (AV) authentication system is developed with the objective to increase the robustness of the system to face illumination variation. A multiband feature fusion approach is proposed to search for the mid- and high-frequency subbands that are insensitive to variation in illumination based on wavelet packet decomposition to solve the illumination problem. Simulation results show that the multiband feature fusion approach achieved higher recognition accuracy as compared to a previous study. To further improve the robustness of multiband feature fusion approach to not only invariant to face illumination but also invariant to facial expression variation, principle component analysis is employed to work in conjunction with the multiband feature fusion approach. Then the AV authentication system is implemented over internet protocol (IP) to enable long distance access. In this system, we are concerned about video and audio streaming. Hence, the effects of speech and face compression on recognition performance of AV authentication system over the internet protocol are investigated. The experiment results show that the AV authentication system over IP with smaller data size achieved the same recognition rate as in the standalone system.

[1]  Hwangjun Song,et al.  An Online Face Recognition System Using Multiple Compressed Images over the Internet , 2005, WISE.

[2]  Sanjit K. Mitra,et al.  Rate-distortion optimized mode selection for very low bit rate video coding and the emerging H.263 standard , 1996, IEEE Trans. Circuits Syst. Video Technol..

[3]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[4]  Marian Stewart Bartlett,et al.  Face recognition by independent component analysis , 2002, IEEE Trans. Neural Networks.

[5]  J.N. Gowdy,et al.  CUAVE: A new audio-visual database for multimodal human-computer interface research , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Tinku Acharya,et al.  JPEG2000 standard for image compression , 2004 .

[7]  Alex Pentland,et al.  Flexible Images: Matching and Recognition Using Learned Deformations , 1997, Comput. Vis. Image Underst..

[8]  David J. Kriegman,et al.  From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Meng Joo Er,et al.  High-speed face recognition based on discrete cosine transform and RBF neural networks , 2005, IEEE Transactions on Neural Networks.

[10]  Chad Zhu RTP Payload Format for H.263 Video Streams , 1997, RFC.

[11]  Aggelos K. Katsaggelos,et al.  Audio-Visual Biometrics , 2006, Proceedings of the IEEE.

[12]  Xiaoguang Lu,et al.  Image Analysis for Face Recognition , 2005 .

[13]  Jean-François Bonastre,et al.  Overview of compression and packet loss effects in speech biometrics , 2003 .

[14]  Henning Schulzrinne,et al.  RTP: A Transport Protocol for Real-Time Applications , 1996, RFC.

[15]  Pong C. Yuen,et al.  Human face recognition using PCA on wavelet subband , 2000, J. Electronic Imaging.

[16]  Jon M. Peha,et al.  Streaming video over the Internet: approaches and directions , 2001, IEEE Trans. Circuits Syst. Video Technol..

[17]  Wei Ding,et al.  Rate control of MPEG video coding and recording by rate-quantization modeling , 1996, IEEE Trans. Circuits Syst. Video Technol..

[18]  Kah Phooi Seng,et al.  Audio-Visual Recognition System with Intra-Modal Fusion , 2007 .

[19]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[20]  Bülent Sankur,et al.  ARTICLE IN PRESS Image and Vision Computing xx (2005) 1–9 www.elsevier.com/locate/imavis , 2004 .