An extended pose-invariant lipreading system

In recent work, we have concentrated on the problem of lipreading from non-frontal views (poses). In particular, we have focused on the use of profile views, and proposed two approaches for lipreading on basis of visual features extracted from such views: (a) Direct statistical modeling of the features, namely use of view-dependent statistical models; and (b) Normalization of such features by their projection onto the ``space'' of frontal-view visual features, which allows employing one set of statistical models for all available views. The latter approach has been considered for two only poses (frontal and profile views), and for visual features of a specific dimensionality. In this paper, we further extend this work, by investigating its applicability to the case where data from three views are available (frontal, left- and right-profile). In addition, we examine the effect of visual feature dimensionality on the pose-normalization approach. Our experiments demonstrate that results generalize well to three views, but also that feature dimensionality is crucial to the effectiveness of the approach. In particular, feature dimensionality larger than 30 is detrimental to multi-pose visual speech recognition performance.

[1]  P. Jonathon Phillips,et al.  Face recognition based on frontal views generated from non-frontal images , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[2]  Chalapathy Neti,et al.  Recent advances in the automatic recognition of audiovisual speech , 2003, Proc. IEEE.

[3]  Jean-Marc Odobez,et al.  Multimodal multispeaker probabilistic tracking in meetings , 2005, ICMI '05.

[4]  Sridha Sridharan,et al.  A unified approach to multi-pose audio-visual ASR , 2007, INTERSPEECH.

[5]  Alexander H. Waibel CHIL - Computers in the Human Interaction Loop , 2005, MVA.

[6]  Gerasimos Potamianos,et al.  Lipreading Using Profile Versus Frontal Views , 2006, 2006 IEEE Workshop on Multimedia Signal Processing.

[7]  Ralph Gross,et al.  Appearance-based face recognition and light-fields , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[9]  A. Pentland Smart rooms, smart clothes , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[10]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.