Few-Shot Adaptive Gaze Estimation

Inter-personal anatomical differences limit the accuracy of person-independent gaze estimation networks. Yet there is a need to lower gaze errors further to enable applications requiring higher quality. Further gains can be achieved by personalizing gaze networks, ideally with few calibration samples. However, over-parameterized neural networks are not amenable to learning from few examples as they can quickly over-fit. We embrace these challenges and propose a novel framework for Few-shot Adaptive GaZE Estimation (Faze) for learning person-specific gaze networks with very few (≤ 9) calibration samples. Faze learns a rotation-aware latent representation of gaze via a disentangling encoder-decoder architecture along with a highly adaptable gaze estimator trained using meta-learning. It is capable of adapting to any new person to yield significant performance gains with as few as 3 samples, yielding state-of-the-art performance of 3.18-deg on GazeCapture, a 19% improvement over prior art. We open-source our code at https://github.com/NVlabs/few_shot_gaze

[1]  Gang Liu,et al.  Deep Multitask Gaze Estimation with a Constrained Landmark-Gaze Model , 2018, ECCV Workshops.

[2]  Andrea Vedaldi,et al.  Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.

[3]  Yusuke Sugano,et al.  Revisiting data normalization for appearance-based gaze estimation , 2018, ETRA.

[4]  V. Lepetit,et al.  EPnP: An Accurate O(n) Solution to the PnP Problem , 2009, International Journal of Computer Vision.

[5]  Daan Wierstra,et al.  One-Shot Generalization in Deep Generative Models , 2016, ICML.

[6]  Yoichi Sato,et al.  Learning-by-Synthesis for Appearance-Based 3D Gaze Estimation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Joshua Achiam,et al.  On First-Order Meta-Learning Algorithms , 2018, ArXiv.

[8]  Marcin Andrychowicz,et al.  Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[9]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[10]  Daan Wierstra,et al.  Meta-Learning with Memory-Augmented Neural Networks , 2016, ICML.

[11]  Otmar Hilliges,et al.  Learning to find eye region landmarks for remote gaze estimation in unconstrained settings , 2018, ETRA.

[12]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  José M. F. Moura,et al.  Few-Shot Human Motion Prediction via Meta-learning , 2018, ECCV.

[14]  Moshe Eizenman,et al.  General theory of remote gaze estimation using the pupil center and corneal reflections , 2006, IEEE Transactions on Biomedical Engineering.

[15]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[16]  Narendra Ahuja,et al.  Appearance-based eye gaze estimation , 2002, Sixth IEEE Workshop on Applications of Computer Vision, 2002. (WACV 2002). Proceedings..

[17]  Qiang Ji,et al.  A Hierarchical Generative Model for Eye Image Synthesis and Eye Gaze Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Gang Liu,et al.  A Differential Approach for Gaze Estimation with Calibration , 2018, BMVC.

[19]  Otmar Hilliges,et al.  Deep Pictorial Gaze Estimation , 2018, ECCV.

[20]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[21]  M. Betke,et al.  The Camera Mouse: visual tracking of body features to provide computer access for people with severe disabilities , 2002, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[22]  Takahiro Okabe,et al.  Inferring human gaze from appearance via adaptive linear regression , 2011, 2011 International Conference on Computer Vision.

[23]  Alexei A. Efros,et al.  Few-Shot Segmentation Propagation with Guided Networks , 2018, ArXiv.

[24]  Sepp Hochreiter,et al.  Self-Normalizing Neural Networks , 2017, NIPS.

[25]  Wojciech Matusik,et al.  Eye Tracking for Everyone , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Mario Fritz,et al.  MPIIGaze: Real-World Dataset and Deep Appearance-Based Gaze Estimation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Qiang Ji,et al.  In the Eye of the Beholder: A Survey of Models for Eyes and Gaze , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[29]  R. Pieters,et al.  A Review of Eye-Tracking Research in Marketing , 2008 .

[30]  Yusuke Sugano,et al.  Evaluation of Appearance-Based Methods and Implications for Gaze-Based Applications , 2019, CHI.

[31]  Gabriel J. Brostow,et al.  Interpretable Transformations with Encoder-Decoder Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[32]  Mario Fritz,et al.  It’s Written All Over Your Face: Full-Face Appearance-Based Gaze Estimation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[33]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[34]  Alex Fridman,et al.  Cognitive Load Estimation in the Wild , 2018, CHI.

[35]  William J. Christmas,et al.  A Multiresolution 3D Morphable Face Model and Fitting Framework , 2016, VISIGRAPP.

[36]  John L. Sibert,et al.  The reading assistant: eye gaze triggered auditory prompting for reading remediation , 2000, UIST '00.

[37]  Mario Fritz,et al.  Appearance-based gaze estimation in the wild , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Sina Honari,et al.  Improving Landmark Localization with Semi-Supervised Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  Wangjiang Zhu,et al.  Monocular Free-Head 3D Gaze Tracking with Deep Learning and Geometry Constraints , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[40]  Hugo Larochelle,et al.  Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[41]  Mohan M. Trivedi,et al.  Where is the driver looking: Analysis of head, eye and iris for robust gaze zone estimation , 2014, 17th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[42]  Takahiro Okabe,et al.  A Head Pose-free Approach for Appearance-based Gaze Estimation , 2011, BMVC.

[43]  Tomas Pfister,et al.  Learning from Simulated and Unsupervised Images through Adversarial Training , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Jeff Huang,et al.  SearchGazer: Webcam Eye Tracking for Remote Studies of Web Search , 2017, CHIIR.

[45]  Kaiming He,et al.  Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.

[46]  Jörg Müller,et al.  GazeHorizon: enabling passers-by to interact with public displays by gaze , 2014, UbiComp.

[47]  Peiyun Hu,et al.  Finding Tiny Faces , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Yiannis Demiris,et al.  RT-GENE: Real-Time Eye Gaze Estimation in Natural Environments , 2018, ECCV.

[49]  Jan Kautz,et al.  Light-Weight Head Pose Invariant Gaze Tracking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[50]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Geoffrey E. Hinton,et al.  Transforming Auto-Encoders , 2011, ICANN.

[52]  Alexander C. Berg,et al.  Meta-Tracker: Fast and Robust Online Adaptation for Visual Object Trackers , 2018, ECCV.

[53]  Feng Lu,et al.  Appearance-Based Gaze Estimation via Evaluation-Guided Asymmetric Regression , 2018, ECCV.

[54]  Andreas Dengel,et al.  Text 2.0 , 2010, CHI EA '10.

[55]  Hong Va Leong,et al.  StressClick: Sensing Stress from Gaze-Click Patterns , 2016, ACM Multimedia.

[56]  Joohwan Kim,et al.  Perceptually-based foveated virtual reality , 2016, SIGGRAPH Emerging Technologies.

[57]  Pascal Fua,et al.  Unsupervised Geometry-Aware Representation for 3D Human Pose Estimation , 2018, ECCV.

[58]  Hoon Kim,et al.  Simulated+Unsupervised Learning With Adaptive Data Generation and Bidirectional Mappings , 2018, ICLR.

[59]  Jiankang Deng,et al.  Cascade Multiview Hourglass Model for Robust 3 D Face Alignment , 2018 .

[60]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.