The use of lip motion for biometric speaker identification

The paper addresses the selection of the best lip motion features for biometric open-set speaker identification. The best features are those that result in the highest discrimination of individual speakers in a population. We first detect the face region in each video frame. The lip region for each frame is then segmented following registration of successive face regions by global motion compensation. The initial lip feature vector is composed of the 2D-DCT coefficients of the optical flow vectors within the lip region at each frame. We propose to select the most discriminative features from the full set of transform coefficients by using a probabilistic measure that maximizes the ratio of intra-class and inter-class probabilities. The resulting discriminative feature vector with reduced dimension is expected to maximize the identification performance. Experimental results support that the resulting discriminative feature vector with reduced dimension improves the identification performance.

[1]  Robert Frischholz,et al.  BioID: A Multimodal Biometric Identification System , 2000, Computer.

[2]  A. Murat Tekalp,et al.  Joint audio-video processing for biometric speaker identification , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[3]  Kevin P. Murphy,et al.  A coupled HMM for audio-visual speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Farzin Deravi,et al.  A review of speech-based bimodal recognition , 2002, IEEE Trans. Multim..

[5]  Tsuhan Chen,et al.  Password-free network security through joint use of audio and video , 1997, Other Conferences.

[6]  Sridha Sridharan,et al.  Adaptive Fusion of Speech and Lip Information for Robust Speaker Identification , 2001, Digit. Signal Process..

[7]  Juergen Luettin,et al.  Acoustic-labial Speaker Verification , 1997, AVBPA.

[8]  Engin Erzin,et al.  Joint Audio-Video Processing for Robust Biometric Speaker Identification in Car , 2005 .

[9]  Jean-Marc Odobez,et al.  Robust Multiresolution Estimation of Parametric Motion Models , 1995, J. Vis. Commun. Image Represent..