Visual speech understanding using independent component analysis

The purpose of this work is to enhance the performance of visual speech recognition by using independent component analysis (ICA) in order to extract statistically independent visual features. In the first place, we derive the optical flow fields for consecutive frames of people speaking. Then, we use ICA in order to derive the basis images for these optical flow fields. The coefficients of these basis flow fields will comprise the visual features of interest. We will show that using ICA on optical flow fields yields better classification results than the traditional approaches based on principal component analysis (PCA) for instance. Our approach is evaluated for the Tulipsi database and compared to the standard approaches.

[1]  David J. Fleet,et al.  Design and Use of Linear Models for Image Motion Analysis , 2000, International Journal of Computer Vision.

[2]  Juergen Luettin,et al.  Speechreading using Probabilistic Models , 1997, Comput. Vis. Image Underst..

[3]  Michael J. Black,et al.  The Robust Estimation of Multiple Motions: Parametric and Piecewise-Smooth Flow Fields , 1996, Comput. Vis. Image Underst..

[4]  A. J. Collins,et al.  Introduction To Multivariate Analysis , 1981 .

[5]  Javier R. Movellan,et al.  Visual Speech Recognition with Stochastic Networks , 1994, NIPS.

[6]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[7]  Terrence J. Sejnowski,et al.  Unsupervised Learning , 2018, Encyclopedia of GIS.

[8]  Ioannis Pitas,et al.  A Support Vector Machine-Based Dynamic Network for Visual Speech Recognition Applications , 2002, EURASIP J. Adv. Signal Process..

[9]  Richard Lippmann,et al.  Speech recognition by machines and humans , 1997, Speech Commun..

[10]  Terrence J. Sejnowski,et al.  The “independent components” of natural scenes are edge filters , 1997, Vision Research.

[11]  Chalapathy Neti,et al.  Recent advances in the automatic recognition of audiovisual speech , 2003, Proc. IEEE.

[12]  Javier R. Movellan,et al.  A Comparison of Image Processing Techniques for Visual Speech Recognition Applications , 2000, NIPS.

[13]  Yochai Konig,et al.  "Eigenlips" for robust speech recognition , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[15]  Nikos Paragios,et al.  Handbook of Mathematical Models in Computer Vision , 2005 .

[16]  Sadaoki Furui,et al.  A Robust Multimodal Speech Recognition Method using Optical Flow Analysis , 2005 .