ETH-XGaze: A Large Scale Dataset for Gaze Estimation under Extreme Head Pose and Gaze Variation

Gaze estimation is a fundamental task in many applications of computer vision, human computer interaction and robotics. Many state-of-the-art methods are trained and tested on custom datasets, making comparison across methods challenging. Furthermore, existing gaze estimation datasets have limited head pose and gaze variations, and the evaluations are conducted using different protocols and metrics. In this paper, we propose a new gaze estimation dataset called ETH-XGaze, consisting of over one million high-resolution images of varying gaze under extreme head poses. We collect this dataset from 110 participants with a custom hardware setup including 18 digital SLR cameras and adjustable illumination conditions, and a calibrated system to record ground truth gaze targets. We show that our dataset can significantly improve the robustness of gaze estimation methods across different head poses and gaze angles. Additionally, we define a standardized experimental protocol and evaluation metric on ETH-XGaze, to better unify gaze estimation research going forward. The dataset and benchmark website are available at this https URL

[1]  Wojciech Matusik,et al.  Gaze360: Physically Unconstrained Gaze Estimation in the Wild , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  Gang Liu,et al.  A Differential Approach for Gaze Estimation with Calibration , 2018, BMVC.

[3]  Cristian Sminchisescu,et al.  Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Jan Kautz,et al.  Few-Shot Adaptive Gaze Estimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Otmar Hilliges,et al.  Deep Pictorial Gaze Estimation , 2018, ECCV.

[6]  Yaser Sheikh,et al.  Predicting Primary Gaze Behavior Using Social Saliency Fields , 2013, 2013 IEEE International Conference on Computer Vision.

[7]  Peter Robinson,et al.  Learning an appearance-based gaze estimator from one million synthesised images , 2016, ETRA.

[8]  Wojciech Matusik,et al.  Eye Tracking for Everyone , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Mario Fritz,et al.  MPIIGaze: Real-World Dataset and Deep Appearance-Based Gaze Estimation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Tomas Pfister,et al.  Learning from Simulated and Unsupervised Images through Adversarial Training , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Gang Liu,et al.  Improving Few-Shot User-Specific Gaze Adaptation via Gaze Redirection Synthesis , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  William J. Christmas,et al.  A Multiresolution 3D Morphable Face Model and Fitting Framework , 2016, VISIGRAPP.

[13]  Päivi Majaranta,et al.  Eye Tracking and Eye-Based Human–Computer Interaction , 2014 .

[14]  Thomas Brox,et al.  FreiHAND: A Dataset for Markerless Capture of Hand Pose and Shape From Single RGB Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Derek Bradley,et al.  Practical Person‐Specific Eye Rigging , 2019, Comput. Graph. Forum.

[16]  Roderick E. Darby,et al.  Medical Physiology and Biophysics , 1961 .

[17]  Zhe He,et al.  Photo-Realistic Monocular Gaze Redirection Using Generative Adversarial Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  Yusuke Sugano,et al.  Revisiting data normalization for appearance-based gaze estimation , 2018, ETRA.

[19]  Fei Wang,et al.  A Coarse-to-Fine Adaptive Network for Appearance-Based Gaze Estimation , 2020, AAAI.

[20]  Steven K. Feiner,et al.  Gaze locking: passive eye contact detection for human-object interaction , 2013, UIST.

[21]  Yiannis Demiris,et al.  Prediction of intent in robotics and multi-agent systems , 2007, Cognitive Processing.

[22]  Peter Robinson,et al.  A 3D Morphable Eye Region Model for Gaze Estimation , 2016, ECCV.

[23]  Yiannis Demiris,et al.  RT-GENE: Real-Time Eye Gaze Estimation in Natural Environments , 2018, ECCV.

[24]  Shumeet Baluja,et al.  Non-Intrusive Gaze Tracking Using Artificial Neural Networks , 1993, NIPS.

[25]  Rui Zhao,et al.  Generalizing Eye Tracking With Bayesian Adversarial Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[27]  Qiang Ji,et al.  A Hierarchical Generative Model for Eye Image Synthesis and Eye Gaze Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Ira Kemelmacher-Shlizerman,et al.  The MegaFace Benchmark: 1 Million Faces for Recognition at Scale , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Thabo Beeler,et al.  High-quality single-shot capture of facial geometry , 2010, ACM Trans. Graph..

[30]  Jean-Marc Odobez,et al.  Unsupervised Representation Learning for Gaze Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Derek Bradley,et al.  Lightweight eye capture using a parametric model , 2016, ACM Trans. Graph..

[32]  Park Seonwook,et al.  Deep Pictorial Gaze Estimation , 2018 .

[33]  Yoichi Sato,et al.  Learning-by-Synthesis for Appearance-Based 3D Gaze Estimation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Takahiro Okabe,et al.  Adaptive Linear Regression for Appearance-Based Gaze Estimation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[36]  Ira Kemelmacher-Shlizerman,et al.  Level Playing Field for Million Scale Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Takahiro Okabe,et al.  Inferring human gaze from appearance via adaptive linear regression , 2011, 2011 International Conference on Computer Vision.

[38]  Qiong Huang,et al.  TabletGaze: dataset and analysis for unconstrained appearance-based gaze estimation in mobile tablets , 2017, Machine Vision and Applications.

[39]  Pietro Perona,et al.  Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Thabo Beeler,et al.  High-quality single-shot capture of facial geometry , 2010, SIGGRAPH 2010.

[41]  Georgios Tzimiropoulos,et al.  How Far are We from Solving the 2D & 3D Face Alignment Problem? (and a Dataset of 230,000 3D Facial Landmarks) , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[42]  Richard Szeliski,et al.  A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[43]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Jean-Marc Odobez,et al.  EYEDIAP: a database for the development and evaluation of gaze estimation algorithms from RGB and RGB-D cameras , 2014, ETRA.