A New Manifold Representation for Visual Speech Recognition

In this paper, we propose a new manifold representation for visual speech recognition. The developed system consists of three main steps: a) lip extraction from input video data, b) generate the expectation-maximization PCA (EMPCA) manifolds for the entire image sequence and perform manifold interpolation and re-sampling, c) classify the manifolds using a HMM classifier to identify the words described by the lips motions in the input video sequence.

[1]  Sam T. Roweis,et al.  EM Algorithms for PCA and SPCA , 1997, NIPS.

[2]  Jeffrey F. Cohn,et al.  Robust Lip Tracking by Combining Shape, Color and Motion , 2007 .

[3]  Stephen M. Omohundro,et al.  Nonlinear manifold learning for visual speech recognition , 1995, Proceedings of IEEE International Conference on Computer Vision.

[4]  Alistair Sutherland,et al.  Accurate Recognition of Large Number of Hand Gestures , 2003 .

[5]  Liang Dong,et al.  A Two-Channel Training Algorithm for Hidden Markov Model and Its Application to Lip Reading , 2005, EURASIP J. Adv. Signal Process..

[6]  A. Murat Tekalp,et al.  Discriminative Analysis of Lip Motion Features for Speaker Identification and Speech-Reading , 2006, IEEE Transactions on Image Processing.

[7]  Alice Caplier,et al.  Accurate and quasi-automatic lip tracking , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[8]  Juergen Luettin,et al.  Active Shape Models for Visual Speech Feature Extraction , 1996 .

[9]  Alice Caplier,et al.  New color transformation for lips segmentation , 2001, 2001 IEEE Fourth Workshop on Multimedia Signal Processing (Cat. No.01TH8564).

[10]  Liang Dong,et al.  Recognition of visual speech elements using adaptively boosted hidden Markov models , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[11]  Ioannis Pitas,et al.  Application of support vector machines classifiers to visual speech recognition , 2002, Proceedings. International Conference on Image Processing.

[12]  Sanjay Kumar,et al.  Visual Speech Recognition Using Image Moments and Multiresolution Wavelet Images , 2006, International Conference on Computer Graphics, Imaging and Visualisation (CGIV'06).

[13]  Leif H. Finkel,et al.  Gait recognition by two-stage principal component analysis , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[14]  Yoni Bauduin,et al.  Audio-Visual Speech Recognition , 2004 .

[15]  Rong Chen,et al.  A PCA Based Visual DCT Feature Extraction Method for Lip-Reading , 2006, 2006 International Conference on Intelligent Information Hiding and Multimedia.

[16]  Timothy F. Cootes,et al.  A Comparative Evaluation of Active Appearance Model Algorithms , 1998, BMVC.

[17]  Stephen J. Cox,et al.  Lip reading from scale-space measurements , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.