A PCA based manifold representation for visual speech recognition

In this paper, we discuss a new principal component analysis (PCA)-based manifold representation for visual speech recognition. In this regard, the real time input video data is compressed using principal component analysis and the low-dimensional points calculated for each frame define the manifold. Since the number of frames that form the video sequence is dependent on the word complexity, in order to use these manifolds for visual speech classification it is required to re-sample them into a fixed pre-defined number of key-points. These key-points are used as input for a hidden Markov model (HMM) classification scheme. We have applied the developed visual speech recognition system to a database containing a group of english words and the experimental data indicates that the proposed approach is able to produce accurate classification results.

[1]  Timothy F. Cootes,et al.  A Comparative Evaluation of Active Appearance Model Algorithms , 1998, BMVC.

[2]  Leif H. Finkel,et al.  Gait recognition by two-stage principal component analysis , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[3]  Eric D. Petajan Automatic lipreading to enhance speech recognition , 1984 .

[4]  Yoni Bauduin,et al.  Audio-Visual Speech Recognition , 2004 .

[5]  Sanjay Kumar,et al.  Visual Speech Recognition Using Image Moments and Multiresolution Wavelet Images , 2006, International Conference on Computer Graphics, Imaging and Visualisation (CGIV'06).

[6]  Stephen M. Omohundro,et al.  Nonlinear manifold learning for visual speech recognition , 1995, Proceedings of IEEE International Conference on Computer Vision.

[7]  Stephen J. Cox,et al.  Lip reading from scale-space measurements , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  Alice Caplier,et al.  Accurate and quasi-automatic lip tracking , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[9]  Ioannis Pitas,et al.  Application of support vector machines classifiers to visual speech recognition , 2002, Proceedings. International Conference on Image Processing.

[10]  Juergen Luettin,et al.  Active Shape Models for Visual Speech Feature Extraction , 1996 .

[11]  Alice Caplier,et al.  New color transformation for lips segmentation , 2001, 2001 IEEE Fourth Workshop on Multimedia Signal Processing (Cat. No.01TH8564).

[12]  A. Murat Tekalp,et al.  Discriminative Analysis of Lip Motion Features for Speaker Identification and Speech-Reading , 2006, IEEE Transactions on Image Processing.

[13]  Liang Dong,et al.  A Two-Channel Training Algorithm for Hidden Markov Model and Its Application to Lip Reading , 2005, EURASIP J. Adv. Signal Process..

[14]  Liang Dong,et al.  Recognition of visual speech elements using adaptively boosted hidden Markov models , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[15]  Timothy F. Cootes,et al.  Extraction of Visual Features for Lipreading , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Sam T. Roweis,et al.  EM Algorithms for PCA and SPCA , 1997, NIPS.

[17]  Rong Chen,et al.  A PCA Based Visual DCT Feature Extraction Method for Lip-Reading , 2006, 2006 International Conference on Intelligent Information Hiding and Multimedia.

[18]  Alistair Sutherland,et al.  Accurate Recognition of Large Number of Hand Gestures , 2003 .

[19]  Jeffrey F. Cohn,et al.  Robust Lip Tracking by Combining Shape, Color and Motion , 2007 .