Optical flow based lip reading using non rectangular ROI and head motion reduction

The mouth and the lips are deformed by muscles, and such deformations appear on the surface of the skin. Optical flow can be estimated by these deformations. In this study, we investigate optical flow-based lip reading. In natural speech, we slightly move our head, with speech without head movement being restricted. However, traditional optical flow-based approaches have not considered head movement of the speaker. Moreover, movement of the skin varies with facial placement. However, traditional approaches only consider a rectangular region of interest (ROI). This study addresses previous problems and proposes a novel optical flow-based lip reading method that includes head motion and employs facial features based ROIs. We demonstrate that our method exhibits the highest performance among several other methods for lip reading.

[1]  Alex Pentland,et al.  Automatic lipreading by optical-flow analysis , 1989 .

[2]  Matti Pietikäinen,et al.  A review of recent advances in visual speech decoding , 2014, Image Vis. Comput..

[3]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[4]  Jayavardhana Gubbi,et al.  Lip reading using optical flow and support vector machines , 2010, 2010 3rd International Congress on Image and Signal Processing.

[5]  Takeshi Saitoh Efficient face model for lip reading , 2013, AVSP.

[6]  Juergen Luettin,et al.  Audio-Visual Speech Modeling for Continuous Speech Recognition , 2000, IEEE Trans. Multim..

[7]  Sabri Gurbuz,et al.  Moving-Talker, Speaker-Independent Feature Study, and Baseline Results Using the CUAVE Multimodal Speech Corpus , 2002, EURASIP J. Adv. Signal Process..

[8]  Timothy F. Cootes,et al.  Active Appearance Models , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Daijin Kim,et al.  Real-time lip reading system for isolated Korean word recognition , 2011, Pattern Recognit..

[10]  Sadaoki Furui,et al.  Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images , 2007, EURASIP J. Audio Speech Music. Process..

[11]  Timothy F. Cootes,et al.  Extraction of Visual Features for Lipreading , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Kenji Mase Member,et al.  Automatic lipreading by optical-flow analysis , 2014 .

[13]  Yochai Konig,et al.  "Eigenlips" for robust speech recognition , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Sridha Sridharan,et al.  Patch-based analysis of visual speech from multiple views , 2008, AVSP.