论文信息 - A New Manifold Representation for Visual Speech Recognition

A New Manifold Representation for Visual Speech Recognition

In this paper, we propose a new manifold representation for visual speech recognition. The developed system consists of three main steps: a) lip extraction from input video data, b) generate the expectation-maximization PCA (EMPCA) manifolds for the entire image sequence and perform manifold interpolation and re-sampling, c) classify the manifolds using a HMM classifier to identify the words described by the lips motions in the input video sequence.

[1] Sam T. Roweis,et al. EM Algorithms for PCA and SPCA , 1997, NIPS.

[2] Jeffrey F. Cohn,et al. Robust Lip Tracking by Combining Shape, Color and Motion , 2007 .

[3] Stephen M. Omohundro,et al. Nonlinear manifold learning for visual speech recognition , 1995, Proceedings of IEEE International Conference on Computer Vision.

[4] Alistair Sutherland,et al. Accurate Recognition of Large Number of Hand Gestures , 2003 .

[5] Liang Dong,et al. A Two-Channel Training Algorithm for Hidden Markov Model and Its Application to Lip Reading , 2005, EURASIP J. Adv. Signal Process..

[6] A. Murat Tekalp,et al. Discriminative Analysis of Lip Motion Features for Speaker Identification and Speech-Reading , 2006, IEEE Transactions on Image Processing.

[7] Alice Caplier,et al. Accurate and quasi-automatic lip tracking , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[8] Juergen Luettin,et al. Active Shape Models for Visual Speech Feature Extraction , 1996 .

[9] Alice Caplier,et al. New color transformation for lips segmentation , 2001, 2001 IEEE Fourth Workshop on Multimedia Signal Processing (Cat. No.01TH8564).

[10] Liang Dong,et al. Recognition of visual speech elements using adaptively boosted hidden Markov models , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[11] Ioannis Pitas,et al. Application of support vector machines classifiers to visual speech recognition , 2002, Proceedings. International Conference on Image Processing.

[12] Sanjay Kumar,et al. Visual Speech Recognition Using Image Moments and Multiresolution Wavelet Images , 2006, International Conference on Computer Graphics, Imaging and Visualisation (CGIV'06).

[13] Leif H. Finkel,et al. Gait recognition by two-stage principal component analysis , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[14] Yoni Bauduin,et al. Audio-Visual Speech Recognition , 2004 .

[15] Rong Chen,et al. A PCA Based Visual DCT Feature Extraction Method for Lip-Reading , 2006, 2006 International Conference on Intelligent Information Hiding and Multimedia.

[16] Timothy F. Cootes,et al. A Comparative Evaluation of Active Appearance Model Algorithms , 1998, BMVC.

[17] Stephen J. Cox,et al. Lip reading from scale-space measurements , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.