Learning Multi-Boosted HMMs for Lip-Password Based Speaker Verification

This paper proposes a concept of lip motion password (simply called lip-password hereinafter), which is composed of a password embedded in the lip movement and the underlying characteristic of lip motion. It provides a double security to a visual speaker verification system, where the speaker is verified by both of the private password information and the underlying behavioral biometrics of lip motions simultaneously. Accordingly, the target speaker saying the wrong password or an impostor who knows the correct password will be detected and rejected. To this end, we shall present a multi-boosted Hidden Markov model (HMM) learning approach to such a system. Initially, we extract a group of representative visual features to characterize each lip frame. Then, an effective lip motion segmentation algorithm is addressed to segment the lip-password sequence into a small set of distinguishable subunits. Subsequently, we integrate HMMs with boosting learning framework associated with a random subspace method and data sharing scheme to formulate a precise decision boundary for these subunits verification, featuring on high discrimination power. Finally, the lip-password, whether spoken by the target speaker with the pre-registered password or not, is identified based on all the subunit verification results learned from multi-boosted HMMs. The experimental results show that the proposed approach performs favorably compared with the state-of-the-art methods.

[1]  Patrick Kenny,et al.  A Study of Interspeaker Variability in Speaker Verification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[3]  Lawrence K. Saul,et al.  Comparison of Large Margin Training to Other Discriminative Methods for Phonetic Recognition by Hidden Markov Models , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[4]  Xin Liu,et al.  A robust lip tracking algorithm using localized color active contours and deformable models , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Xiaogang Wang,et al.  Boosted multi-task learning for face verification with applications to web image and video search , 2009, CVPR.

[6]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[7]  Xuelong Li,et al.  Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Dinesh Kant Kumar,et al.  Visual Speech Recognition and Utterance Segmentation Based on Mouth Movement , 2007, 9th Biennial Conference of the Australian Pattern Recognition Society on Digital Image Computing Techniques and Applications (DICTA 2007).

[9]  Sridha Sridharan,et al.  A Comparison of Session Variability Compensation Approaches for Speaker Verification , 2010, IEEE Transactions on Information Forensics and Security.

[10]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[11]  James M. Rehg,et al.  Asymmetrically boosted HMM for speech reading , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[12]  Michael Wagner,et al.  Robust face-voice based speaker identity verification using multilevel fusion , 2008, Image Vis. Comput..

[13]  A. Murat Tekalp,et al.  Discriminative Analysis of Lip Motion Features for Speaker Identification and Speech-Reading , 2006, IEEE Transactions on Image Processing.

[14]  Sridha Sridharan,et al.  An approach to statistical lip modelling for speaker identification via chromatic feature extraction , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[15]  Chi-Ho Chan,et al.  Local Ordinal Contrast Pattern Histograms for Spatiotemporal, Lip-Based Speaker Authentication , 2012, IEEE Trans. Inf. Forensics Secur..

[16]  James M. Rehg,et al.  Asymmetrically boosted HMM for speech reading , 2004, CVPR 2004.

[17]  Jian Zhang,et al.  Analysis of lip geometric features for audio-visual speech recognition , 2004, IEEE Trans. Syst. Man Cybern. Part A.

[18]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[19]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Josef Bigün,et al.  Person Verification by Lip-Motion , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[21]  Anupam Shukla,et al.  Expert System for Speaker Identification Using Lip Features with PCA , 2010, 2010 2nd International Workshop on Intelligent Systems and Applications.

[22]  Ralph Gross,et al.  Robust Biometric Person Identification Using Automatic Classifier Fusion of Speech, Mouth, and Face Experts , 2007, IEEE Transactions on Multimedia.

[23]  Herbert Gish,et al.  Discriminatively Trained GMMs for Language Classification Using Boosting Methods , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[24]  Josef Bigün,et al.  Synergy of Lip-Motion and Acoustic Features in Biometric Speech and Speaker Recognition , 2007, IEEE Transactions on Computers.

[25]  Shu Hung Leung,et al.  Lip features selection with application to person authentication , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[26]  Cheol Hoon Park,et al.  Hybrid Simulated Annealing and Its Application to Optimization of Hidden Markov Models for Visual Speech Recognition , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[27]  Liang Dong,et al.  Recognition of visual speech elements using adaptively boosted hidden Markov models , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[28]  Bayya Yegnanarayana,et al.  Face Verification Using Template Matching , 2007, IEEE Transactions on Information Forensics and Security.

[29]  Thomas S. Huang,et al.  Boosting Gaussian mixture models via discriminant analysis , 2008, 2008 19th International Conference on Pattern Recognition.

[30]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[31]  Sébastien Marcel,et al.  A Fast Parts-Based Approach to Speaker Verification Using Boosted Slice Classifiers , 2012, IEEE Transactions on Information Forensics and Security.

[32]  A. Murat Tekalp,et al.  Multimodal speaker identification using an adaptive classifier cascade based on modality reliability , 2005, IEEE Transactions on Multimedia.

[33]  Alan Wee-Chung Liew,et al.  Lip contour extraction from color images using a deformable model , 2002, Pattern Recognit..

[34]  Man-Wai Mak,et al.  Lip-motion analysis for speech segmentation in noise , 1994, Speech Commun..

[35]  Xin Liu,et al.  A multi-boosted HMM approach to lip password based speaker verification , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[36]  Lalit R. Bahl,et al.  Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[37]  Faisal Shafait,et al.  Real Time Lip Motion Analysis for a Person Authentication System using Near Infrared Illumination , 2006, 2006 International Conference on Image Processing.

[38]  Alan Wee-Chung Liew,et al.  ICA-Based Lip Feature Representation for Speaker Authentication , 2007, 2007 Third International IEEE Conference on Signal-Image Technologies and Internet-Based System.

[39]  Juergen Luettin,et al.  Speaker identification by lipreading , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[40]  Harry Shum,et al.  Learning to boost GMM based speaker verification , 2003, INTERSPEECH.

[41]  Shu Hung Leung,et al.  Automatic lip contour extraction from color images , 2004, Pattern Recognit..

[42]  Fredrik Gustafsson,et al.  Determining the initial states in forward-backward filtering , 1996, IEEE Trans. Signal Process..

[43]  Khashayar Yaghmaie,et al.  Automatic visual speech segmentation , 2011, 2011 IEEE 3rd International Conference on Communication Software and Networks.

[44]  Dinesh Kant Kumar,et al.  Automatic visual speech segmentation and recognition using directional motion history images and Zernike moments , 2013, The Visual Computer.

[45]  Chin-Teng Lin,et al.  A Space-Time Delay Neural Network for Motion Recognition and its Application to Lipreading , 1999, Int. J. Neural Syst..