Learning an appearance-based gaze estimator from one million synthesised images

Learning-based methods for appearance-based gaze estimation achieve state-of-the-art performance in challenging real-world settings but require large amounts of labelled training data. Learning-by-synthesis was proposed as a promising solution to this problem but current methods are limited with respect to speed, appearance variability, and the head pose and gaze angle distribution they can synthesize. We present UnityEyes, a novel method to rapidly synthesize large amounts of variable eye region images as training data. Our method combines a novel generative 3D model of the human eye region with a real-time rendering framework. The model is based on high-resolution 3D face scans and uses real-time approximations for complex eyeball materials and structures as well as anatomically inspired procedural geometry methods for eyelid animation. We show that these synthesized images can be used to estimate gaze in difficult in-the-wild scenarios, even for extreme gaze angles or in cases in which the pupil is fully occluded. We also demonstrate competitive gaze estimation results on a benchmark in-the-wild dataset, despite only using a light-weight nearest-neighbor algorithm. We are making our UnityEyes synthesis framework available online for the benefit of the research community.

[1]  Peter Robinson,et al.  Rendering of Eyes for Eye-Shape Registration and Gaze Estimation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[2]  Yoichi Sato,et al.  Learning-by-Synthesis for Appearance-Based 3D Gaze Estimation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Verónica Orvalho,et al.  A Facial Rigging Survey , 2012, Eurographics.

[4]  Takahiro Okabe,et al.  Inferring human gaze from appearance via adaptive linear regression , 2011, 2011 International Conference on Computer Vision.

[5]  Derek Bradley,et al.  Detailed spatio-temporal reconstruction of eyelids , 2015, ACM Trans. Graph..

[6]  Sami Romdhani,et al.  A 3D Face Model for Pose and Illumination Invariant Face Recognition , 2009, 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance.

[7]  George Borshukov,et al.  Pre-Integrated Skin Shading , 2011 .

[8]  Paul E. Debevec,et al.  Image-based lighting , 2002, IEEE Computer Graphics and Applications.

[9]  C. Evinger,et al.  Eyelid movements. Mechanisms and normal data. , 1991, Investigative ophthalmology & visual science.

[10]  Derek Bradley,et al.  High-quality capture of eyes , 2014, ACM Trans. Graph..

[11]  Mario Fritz,et al.  Appearance-based gaze estimation in the wild , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  David J. Kriegman,et al.  Localizing Parts of Faces Using a Consensus of Exemplars , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Erick Miller,et al.  Realistic eye motion using procedural geometric methods , 2009, SIGGRAPH '09.

[15]  Jihun Yu,et al.  Realtime facial animation with on-the-fly correctives , 2013, ACM Trans. Graph..

[16]  Peter Shirley,et al.  Fundamentals of computer graphics , 2018 .

[17]  Thomas S. Huang,et al.  Interactive Facial Feature Localization , 2012, ECCV.

[18]  Luc Van Gool,et al.  Random Forests for Real Time 3D Face Analysis , 2012, International Journal of Computer Vision.

[19]  Qiong Huang,et al.  TabletGaze: A Dataset and Baseline Algorithms for Unconstrained Appearance-based Gaze Estimation in Mobile Tablets , 2015, ArXiv.

[20]  Yiying Tong,et al.  FaceWarehouse: A 3D Facial Expression Database for Visual Computing , 2014, IEEE Transactions on Visualization and Computer Graphics.

[21]  Norman I. Badler,et al.  Look me in the Eyes: A Survey of Eye and Gaze Animation for Virtual Agents and Artificial Systems , 2014, Eurographics.

[22]  André Messias,et al.  Upper and lower eyelid saccades describe a harmonic oscillator function. , 2005, Investigative ophthalmology & visual science.

[23]  Hanspeter Pfister,et al.  Face transfer with multilinear models , 2005, ACM Trans. Graph..

[24]  Matthew Turk,et al.  A Morphable Model For The Synthesis Of 3D Faces , 1999, SIGGRAPH.

[25]  Steven K. Feiner,et al.  Gaze locking: passive eye contact detection for human-object interaction , 2013, UIST.

[26]  Charles T. Loop,et al.  Smooth Subdivision Surfaces Based on Triangles , 1987 .

[27]  Stefanos Zafeiriou,et al.  300 Faces in-the-Wild Challenge: The First Facial Landmark Localization Challenge , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[28]  Takahiro Okabe,et al.  Head pose-free appearance-based gaze sensing via eye image synthesis , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[29]  Jean-Marc Odobez,et al.  Gaze estimation from multimodal Kinect data , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.