Visual Speech Recognition Method Using Translation, Scale and Rotation Invariant Features

This paper reports on a visual speech recognition method that is invariant to translation, rotation and scale. Dynamic features representing the mouth motion is extracted from the video data by using a motion segmentation technique termed as motion history image (MHI). MHI is generated by applying accumulative image differencing technique on the sequence of mouth images. Invariant features are derived from the MHI using feature extraction algorithm that combines Discrete Stationary Wavelet Transform (SWT) and moments. A 2-D SWT at level one is applied to decompose MHI to produce one approximate and three detail sub images. The feature descriptors consist of three moments (geometric moments, Hu moments and Zernike moments) computed from the SWT approximate image. The moments features are normalized to achieve the invariance properties. Artificial neural network (ANN) with back propagation learning algorithm is used to classify the moments features. Initial experiments were conducted to test the sensitivity of the proposed approach to rotation, translation and scale of the mouth images and obtained promising results.

[1]  Ara V. Nefian,et al.  Speaker independent audio-visual continuous speech recognition , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[2]  Ming-Kuei Hu,et al.  Visual pattern recognition by moment invariants , 1962, IRE Trans. Inf. Theory.

[3]  S. Mallat A wavelet tour of signal processing , 1998 .

[4]  Alireza Khotanzad,et al.  Invariant Image Recognition by Zernike Moments , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Jing Huang,et al.  Towards practical deployment of audio-visual speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Sanjay Kumar,et al.  Visual Hand Gestures Classification Using Wavelet Transform and Moment Based Features , 2005, Int. J. Wavelets Multiresolution Inf. Process..

[7]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[8]  Chalapathy Neti,et al.  Recent advances in the automatic recognition of audiovisual speech , 2003, Proc. IEEE.

[9]  Alireza Khotanzad,et al.  Rotation invariant image recognition using features selected via a systematic method , 1990, Pattern Recognit..

[10]  Eric D. Petajan Automatic lipreading to enhance speech recognition , 1984 .

[11]  James R. Glass,et al.  A segment-based audio-visual speech recognizer: data collection, development, and initial experiments , 2004, ICMI '04.

[12]  Kou-Yuan Huang,et al.  Neural network for robust recognition of seismic patterns , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[13]  Roland T. Chin,et al.  On Image Analysis by the Methods of Moments , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  R. Mukundan,et al.  Moment Functions in Image Analysis: Theory and Applications , 1998 .

[15]  M. Teague Image analysis via the general theory of moments , 1980 .