Recent developments in automated lip-reading

Human lip-readers are increasingly being presented as useful in the gathering of forensic evidence but, like all humans, suffer from unreliability. Here we report the results of a long-term study in automatic lip-reading with the objective of converting video-to-text (V2T). The V2T problem is surprising in that some aspects that look tricky, such as real-time tracking of the lips on poor-quality interlaced video from hand-held cameras, but prove to be relatively tractable. Whereas the problem of speaker independent lip-reading is very demanding due to unpredictable variations between people. Here we review the problem of automatic lip-reading for crime fighting and identify the critical parts of the problem.

[1]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[2]  Barry-John Theobald,et al.  Insights into machine lip reading , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Steve Young,et al.  The HTK book version 3.4 , 2006 .

[4]  Barry-John Theobald,et al.  Robust facial feature tracking using selected multi-resolution linear predictors , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[5]  Barry-John Theobald,et al.  Is automated conversion of video to text a reality? , 2012, Optics/Photonics in Security and Defence.

[6]  Mohan M. Trivedi,et al.  Head Pose Estimation for Driver Assistance Systems: A Robust Algorithm and Experimental Evaluation , 2007, 2007 IEEE Intelligent Transportation Systems Conference.