Lipreading from color video

We have designed and implemented a lipreading system that recognizes isolated words using only color video of human lips (without acoustic data). The system performs video recognition using "snakes" to extract visual features of geometric space, Karhunen-Loeve transform (KLT) to extract principal components in the color eigenspace, and hidden Markov models (HMM's) to recognize the combined visual features sequences. With the visual information alone, we were able to achieve 94% accuracy for ten isolated words.

[1]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[2]  Laurent D. Cohen,et al.  Finite-Element Methods for Active Contour Models and Balloons for 2-D and 3-D Images , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Yochai Konig,et al.  "Eigenlips" for robust speech recognition , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Oscar N. Garcia,et al.  Continuous optical automatic speech recognition by lipreading , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[5]  H. Ohnishi,et al.  A Voice Activated Car Audio System , 1991 .

[6]  B.P. Yuhas,et al.  Integration of acoustic and visual speech signals using neural networks , 1989, IEEE Communications Magazine.

[7]  Jenq-Neng Hwang,et al.  Lipreading from color motion video , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[8]  Gregory J. Wolff,et al.  Neural network lipreading system for improved speech recognition , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[9]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[10]  C. G. Fisher,et al.  Confusions among visually perceived consonants. , 1968, Journal of speech and hearing research.

[11]  C. Taylor,et al.  Active shape models - 'Smart Snakes'. , 1992 .

[12]  Alex Pentland,et al.  Automatic lipreading by optical-flow analysis , 1989 .

[13]  E. Petajan,et al.  An improved automatic lipreading system to enhance speech recognition , 1988, CHI '88.

[14]  Jenq-Neng Hwang,et al.  A neural network-based stochastic active contour model (NNS-SNAKE) for contour finding of distinct features , 1995, IEEE Trans. Image Process..

[15]  Thad Starner,et al.  Visual Recognition of American Sign Language Using Hidden Markov Models. , 1995 .

[16]  Alan C. Bovik,et al.  Computer lipreading for improved accuracy in automatic speech recognition , 1996, IEEE Trans. Speech Audio Process..

[17]  Jenq-Neng Hwang,et al.  Image sequence classification using a neural network based active contour model and a hidden Markov model , 1994, Proceedings of 1st International Conference on Image Processing.