Nonlinear scale decomposition based features for visual speech recognition

A mathematical morphology based filter structure called a sieve is used to process mouth image sequences of a talker's mouth and form visual speech features. The effects of varying the type of filter, the post-processing and hidden Markov model (HMM) parameters on recognition accuracy are investigated using two audio-visual speech databases.

[1]  H. McGurk,et al.  Hearing lips and seeing voices , 1976, Nature.

[2]  Rainer Stiefelhagen,et al.  Preprocessing of visual speech under real world conditions , 1997, AVSP.

[3]  Andrew Blake,et al.  Real-Time Lip Tracking for Audio-Visual Speech Recognition Applications , 1996, ECCV.

[4]  A. Adjoudani,et al.  On the Integration of Auditory and Visual Parameters in an HMM-based ASR , 1996 .

[5]  Pierre Chardaire,et al.  Multiscale Nonlinear Decomposition: The Sieve Decomposition Theorem , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  J. Andrew Bangham,et al.  Multiscale recursive medians, scale-space, and transforms with applications to image processing , 1996, IEEE Trans. Image Process..

[7]  W. H. Sumby,et al.  Visual contribution to speech intelligibility in noise , 1954 .

[8]  Stephen M. Omohundro,et al.  Learning Visual Models for Lipreading , 1997 .

[9]  Stephen J. Cox,et al.  Combining noise compensation with visual information in speech recognition , 1997, AVSP.

[10]  Stephen J. Cox,et al.  Lip reading from scale-space measurements , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11]  Eric David Petajan,et al.  Automatic Lipreading to Enhance Speech Recognition (Speech Reading) , 1984 .

[12]  平山亮 会議報告-Speechreading by Humans and Machines; Models Systems and Applications , 1997 .

[13]  Javier R. Movellan,et al.  Visual Speech Recognition with Stochastic Networks , 1994, NIPS.

[14]  N. P. Erber Interaction of audition and vision in the recognition of oral speech stimuli. , 1969, Journal of speech and hearing research.

[15]  J. Andrew Bangham,et al.  Nonlinear Scale-Space from n-Dimensional Sieves , 1996, ECCV.

[16]  Alexander Toet,et al.  Graph morphology , 1992, J. Vis. Commun. Image Represent..

[17]  J. Andrew Bangham,et al.  Morphological scale-space preserving transforms in many dimensions , 1996, J. Electronic Imaging.

[18]  Steve Young,et al.  The HTK book , 1995 .

[19]  David G. Stork,et al.  Visionary Speech: Looking Ahead to Practical Speechreading Systems , 1996 .