Lipreading by Locality Discriminant Graph

The major problem in building a good lipreading system is to extract effective visual features from the enormous quantity of video sequences data. For appearance-based feature analysis in lipreading, classical methods, e.g. DCT, PCA and LDA, are usually applied to dimensionality reduction. We present a new pattern classification algorithm, called locality discriminant graph (LDG), and develop a novel lipreading framework to successfully apply LDG to the problem. LDG takes the advantages of both manifold learning and Fisher criteria to seek the linear embedding which preserves the local neighborhood affinity within same class while discriminating the neighborhood among different classes. The LDG embedding is computed in closed-form and tuned by the only open parameter of k-NN number. Experiments on AVICAR corpus provide evidence that the graph-based pattern classification methods can outperform classical ones for lipreading.

[1]  Eric David Petajan,et al.  Automatic Lipreading to Enhance Speech Recognition (Speech Reading) , 1984 .

[2]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[3]  L. R. Rabiner,et al.  An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition , 1983, The Bell System Technical Journal.

[4]  Timothy F. Cootes,et al.  Extraction of Visual Features for Lipreading , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Gerasimos Potamianos,et al.  An image transform approach for HMM based automatic lipreading , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[6]  M. Naderi Think globally... , 2004, HIV prevention plus!.

[7]  Chalapathy Neti,et al.  Recent advances in the automatic recognition of audiovisual speech , 2003, Proc. IEEE.

[8]  Kevin P. Murphy,et al.  Dynamic Bayesian Networks for Audio-Visual Speech Recognition , 2002, EURASIP J. Adv. Signal Process..

[9]  Yoni Bauduin,et al.  Audio-Visual Speech Recognition , 2004 .

[10]  N. P. Erber Auditory-visual perception of speech. , 1975, The Journal of speech and hearing disorders.

[11]  Alexander H. Waibel,et al.  See Me, Hear Me: Integrating Automatic Speech Recognition and Lip-reading , 1994 .

[12]  Hwann-Tzong Chen,et al.  Local discriminant embedding and its variants , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[13]  Avinash C. Kak,et al.  PCA versus LDA , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[15]  Nanning Zheng,et al.  Neighborhood Discriminant Projection for Face Recognition , 2006, International Conference on Pattern Recognition.

[16]  Ming Liu,et al.  AVICAR: audio-visual speech corpus in a car environment , 2004, INTERSPEECH.

[17]  Steve Young,et al.  The HTK book , 1995 .

[18]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[19]  Ronald R. Coifman,et al.  Data Fusion and Multicue Data Matching by Diffusion Maps , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Lawrence K. Saul,et al.  Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifold , 2003, J. Mach. Learn. Res..

[21]  Shuicheng Yan,et al.  Graph embedding: a general framework for dimensionality reduction , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[22]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[23]  Thomas S. Huang Locally Linear Embedded Eigenspace Analysis , 2005 .