论文信息 - Speaker and Digit Recognition by Audio-Visual Lip Biometrics

Speaker and Digit Recognition by Audio-Visual Lip Biometrics

This paper proposes a new robust bi-modal audio visual digit and speaker recognition system by lip-motion and speech biometrics. To increase the robustness of digit and speaker recognition, we have proposed a method using speaker lip motion information extracted from video sequences with low resolution (128 ×128 pixels). In this paper we investigate a biometric system for digit recognition and speaker identification based using line-motion estimation with speech information and Support Vector Machines. The acoustic and visual features are fused at the feature level showing favourable results with digit recognition being 83% to 100% and speaker recognition 100% on the XM2VTS database.

Josef Bigün | Maycel Isaac Faraj

[1] Josef Bigün,et al. Evaluating liveness by face images and the structure tensor , 2005, Fourth IEEE Workshop on Automatic Identification Advanced Technologies (AutoID'05).

[2] Chalapathy Neti,et al. Recent advances in the automatic recognition of audiovisual speech , 2003, Proc. IEEE.

[3] Bernhard Fröba,et al. SESAM: A Biometric Person Identification System Using Sensor Fusion , 1997, AVBPA.

[4] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[5] Roberto Brunelli,et al. Person identification using multiple cues , 1995, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6] Christopher J. C. Burges,et al. A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[7] Jiri Matas,et al. XM2VTSDB: The Extended M2VTS Database , 1999 .

[8] Pedro J. Moreno,et al. On the use of support vector machines for phonetic classification , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[9] Josef Bigün,et al. Person Verification by Lip-Motion , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[10] Josef Kittler,et al. Audio- and Video-Based Biometric Person Authentication, 5th International Conference, AVBPA 2005, Hilton Rye Town, NY, USA, July 20-22, 2005, Proceedings , 2005, AVBPA.

[11] Juergen Luettin,et al. Evaluation Protocol for the extended M2VTS Database (XM2VTSDB) , 1998 .

[12] G. Granlund. In search of a general picture processing operator , 1978 .

[13] Ara V. Nefian,et al. Speaker independent audio-visual continuous speech recognition , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[14] I. Gavat,et al. Robust speech recognizer using multiclass SVM , 2004, 7th Seminar on Neural Network Applications in Electrical Engineering, 2004. NEUREL 2004. 2004.

[15] Johan Wiklund,et al. Multidimensional Orientation Estimation with Applications to Texture Analysis and Optical Flow , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[16] Stefan Fischer,et al. Face authentication with sparse grid Gabor information , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17] Tilo Burghardt,et al. IEEE 8th Seminar on Neural Network Applications in Electrical Engineering (NEUREL06) , 2006 .

[18] Tom E. Bishop,et al. Blind Image Restoration Using a Block-Stationary Signal Model , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[19] Juergen Luettin,et al. Acoustic-labial speaker verification , 1997, Pattern Recognit. Lett..

[20] Stan Davis,et al. Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[21] Farzin Deravi,et al. A review of speech-based bimodal recognition , 2002, IEEE Trans. Multim..

[22] Douglas A. Reynolds,et al. Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[23] Josef Bigün,et al. Audio-visual person authentication using lip-motion from orientation maps , 2007, Pattern Recognit. Lett..

[24] Richard J. Mammone,et al. Speaker recognition using neural networks and conventional classifiers , 1994, IEEE Trans. Speech Audio Process..

[25] Zhifeng Li,et al. Video based face recognition using multiple classifiers , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[26] William M. Campbell,et al. Support vector machines for speaker verification and identification , 2000, Neural Networks for Signal Processing X. Proceedings of the 2000 IEEE Signal Processing Society Workshop (Cat. No.00TH8501).

[27] Tsuhan Chen,et al. Audiovisual speech processing , 2001, IEEE Signal Process. Mag..