Learning-based Region Selection for End-to-End Gaze Estimation

Traditionally, appearance-based gaze estimation methods use statically defined face regions as input to the gaze estimator, such as eye patches, and therefore suffer from difficult lighting conditions and extreme head poses for which these regions are often not the most informative with respect to the gaze estimation task. We posit that facial regions should be selected dynamically based on the image content and propose a novel gaze estimation method that combines the task of region proposal and gaze estimation into a single end-to-end trainable framework. We introduce a novel loss that allows for unsupervised training of a region proposal network alongside the (supervised) training of the final gaze estimator. We show that our method can learn meaningful region selection strategies and outperforms fixed region approaches. We further show that our method performs particularly well for challenging cases, i.e., those with difficult lighting conditions such as directional lights, extreme head angles, or self-occlusion. Finally, we show that the proposed method achieves better results than the current state-of-the-art method in within and cross-dataset evaluations.

[1]  Rui Zhao,et al.  Generalizing Eye Tracking With Bayesian Adversarial Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Qiang Ji,et al.  In the Eye of the Beholder: A Survey of Models for Eyes and Gaze , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[4]  Dong Wang,et al.  Learning to Navigate for Fine-grained Classification , 2018, ECCV.

[5]  Zhichao Li,et al.  Dynamic Computational Time for Visual Attention , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[6]  Tomas Pfister,et al.  Learning from Simulated and Unsupervised Images through Adversarial Training , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Gang Liu,et al.  Improving Few-Shot User-Specific Gaze Adaptation via Gaze Redirection Synthesis , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Feng Lu,et al.  Appearance-Based Gaze Estimation via Evaluation-Guided Asymmetric Regression , 2018, ECCV.

[9]  Qiang Ji,et al.  Real Time Eye Gaze Tracking with 3D Deformable Eye-Face Model , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Wojciech Matusik,et al.  Learning to Zoom: a Saliency-Based Sampling Layer for Neural Networks , 2018, ECCV.

[13]  Yoichi Sato,et al.  Learning-by-Synthesis for Appearance-Based 3D Gaze Estimation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[15]  Wojciech Matusik,et al.  Eye Tracking for Everyone , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Mario Fritz,et al.  MPIIGaze: Real-World Dataset and Deep Appearance-Based Gaze Estimation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Hyunwoo Kim,et al.  Mixed Effects Neural Networks (MeNets) With Applications to Gaze Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Jean-Marc Odobez,et al.  EYEDIAP: a database for the development and evaluation of gaze estimation algorithms from RGB and RGB-D cameras , 2014, ETRA.

[19]  Yiannis Demiris,et al.  RT-GENE: Real-Time Eye Gaze Estimation in Natural Environments , 2018, ECCV.

[20]  Yusuke Sugano,et al.  Revisiting data normalization for appearance-based gaze estimation , 2018, ETRA.

[21]  Wojciech Matusik,et al.  Gaze360: Physically Unconstrained Gaze Estimation in the Wild , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22]  Otmar Hilliges,et al.  Learning to find eye region landmarks for remote gaze estimation in unconstrained settings , 2018, ETRA.

[23]  Qiang Ji,et al.  A Hierarchical Generative Model for Eye Image Synthesis and Eye Gaze Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Alex Graves,et al.  Recurrent Models of Visual Attention , 2014, NIPS.

[25]  Wangjiang Zhu,et al.  Monocular Free-Head 3D Gaze Tracking with Deep Learning and Geometry Constraints , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26]  Qiang Ji,et al.  Probabilistic gaze estimation without active personal calibration , 2011, CVPR 2011.

[27]  Qiong Huang,et al.  TabletGaze: dataset and analysis for unconstrained appearance-based gaze estimation in mobile tablets , 2017, Machine Vision and Applications.

[28]  Peter Robinson,et al.  Rendering of Eyes for Eye-Shape Registration and Gaze Estimation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[29]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Mario Fritz,et al.  It’s Written All Over Your Face: Full-Face Appearance-Based Gaze Estimation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[31]  Koray Kavukcuoglu,et al.  Multiple Object Recognition with Visual Attention , 2014, ICLR.

[32]  Gang Liu,et al.  A Differential Approach for Gaze Estimation with Calibration , 2018, BMVC.

[33]  Otmar Hilliges,et al.  Deep Pictorial Gaze Estimation , 2018, ECCV.

[34]  Mario Fritz,et al.  Appearance-based gaze estimation in the wild , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Jan Kautz,et al.  Few-Shot Adaptive Gaze Estimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).