Learning to look up: Realtime monocular gaze correction using machine learning

We revisit the well-known problem of gaze correction and present a solution based on supervised machine learning. At training time, our system observes pairs of images, where each pair contains the face of the same person with a fixed angular difference in gaze direction. It then learns to synthesize the second image of a pair from the first one. After learning, the system gets the ability to redirect the gaze of a previously unseen person by the same angular difference as in the training set. Unlike many previous solutions to gaze problem in videoconferencing, ours is purely monocular, i.e. it does not require any hardware apart from an in-built web-camera of a laptop. Being based on efficient machine learning predictors such as decision forests, the system is fast (runs in real-time on a single core of a modern laptop). In the paper, we demonstrate results on a variety of videoconferencing frames and evaluate the method quantitatively on the hold-out set of registered images. The supplementary video shows example sessions of our system at work.

[1]  Markus H. Gross,et al.  Gaze Correction for Home Video Conferencing , 2012 .

[2]  Fernando De la Torre,et al.  Supervised Descent Method and Its Applications to Face Alignment , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Josephine Sullivan,et al.  One millisecond face alignment with an ensemble of regression trees , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Antonio Criminisi,et al.  Filter Forests for Learning Data-Dependent Convolutional Kernels , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Andrew Jones,et al.  Achieving eye contact in a one-to-many 3D video teleconferencing system , 2009, ACM Trans. Graph..

[6]  Jian Sun,et al.  Face Alignment at 3000 FPS via Regressing Local Binary Features , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Yutaka Matsushita,et al.  Multiparty videoconferencing at virtual social distance: MAJIC design , 1994, CSCW '94.

[8]  Andrew Blake,et al.  Efficient Human Pose Estimation from Single Depth Images , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Juergen Gall,et al.  Class-specific Hough forests for object detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Tat-Jen Cham,et al.  Analogous view transfer for gaze correction in video sequences , 2002, 7th International Conference on Control, Automation, Robotics and Vision, 2002. ICARCV 2002..

[11]  Lior Wolf,et al.  An eye for an eye: A single camera gaze-replacement method , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Sebastian Nowozin,et al.  Regression Tree Fields — An efficient, non-parametric approach to image labeling problems , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Andrew Blake,et al.  Gaze manipulation for one-to-one teleconferencing , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[14]  Irfan A. Essa,et al.  Tree-based Classifiers for Bilayer Video Segmentation , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Jean-Charles Bazin,et al.  Gaze correction with a single webcam , 2014, 2014 IEEE International Conference on Multimedia and Expo (ICME).

[16]  Steven K. Feiner,et al.  Gaze locking: passive eye contact detection for human-object interaction , 2013, UIST.

[17]  Vincent Lepetit,et al.  Randomized trees for real-time keypoint recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[18]  Ruigang Yang,et al.  Eye contact in video conference via fusion of time-of-flight depth sensor and stereo , 2011 .

[19]  Yali Amit,et al.  Shape Quantization and Recognition with Randomized Trees , 1997, Neural Computation.

[20]  C. Lawrence Zitnick,et al.  Structured Forests for Fast Edge Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[21]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[22]  Luc Van Gool,et al.  Random Forests for Real Time 3D Face Analysis , 2012, International Journal of Computer Vision.

[23]  Ruigang Yang,et al.  Eye gaze correction with stereovision for video-teleconferencing , 2002, IEEE Transactions on Pattern Analysis and Machine Intelligence.