Detection and Correspondence Matching of Corneal Reflections for Eye Tracking Using Deep Learning

Eye tracking systems that estimate the point-of-gaze are essential in extended reality (XR) systems as they enable new interaction paradigms and technological improvements. It is important for these systems to maintain accuracy when the headset moves relative to the head (known as device slippage) due to head movements or user adjustment. One of the most accurate eye tracking techniques, which is also insensitive to shifts of the system relative to the head, uses two or more infrared (IR) light emitting diodes to illuminate the eye and an IR camera to capture images of the eye. An essential step in estimating the point-of-gaze in these systems is the precise determination of the location of two or more corneal reflections (virtual images of the IR-LEDs that illuminate the eye) in images of the eye. Eye trackers tend to have multiple light sources to ensure at least one pair of reflections for each gaze position. The use of multiple light sources introduces a difficult problem: the need to match the corneal reflections with the corresponding light source over the range of expected eye movements. Corneal reflection detection and matching often fail in XR systems due to the proximity of camera and steep illumination angles of light sources with respect to the eye. The failures are caused by corneal reflections having varying shape and intensity levels or disappearance due to rotation of the eye, or the presence of spurious reflections. We have developed a fully convolutional neural network, based on the UNET architecture, that solves the detection and matching problem in the presence of spurious and missing reflections. Eye images of 25 people were collected in a virtual reality headset using a binocular eye tracking module consisting of five infrared light sources per eye. A set of 4,000 eye images were manually labelled for each of the corneal reflections, and data augmentation was used to generate a dataset of 40,000 images. The network is able to correctly identify and match 91% of corneal reflections present in the test set. This is comparable to a state-of-the-art deep learning system, but our approach requires 33 times less memory and executes 10 times faster. The proposed algorithm, when used in an eye tracker in a VR system, achieved an average mean absolute gaze error of 1°. This is a significant improvement over the state-of-the-art learning-based XR eye tracking systems that have reported gaze errors of 2-3°.

[1]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[2]  Joohwan Kim,et al.  Latency Requirements for Foveated Rendering in Virtual Reality , 2017, ACM Trans. Appl. Percept..

[3]  Zhengyang Wu,et al.  EyeNet: A Multi-Task Deep Network for Off-Axis Eye Gaze Estimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[4]  Feng Li,et al.  Using Structured Illumination to Enhance Video-Based Eye Tracking , 2007, 2007 IEEE International Conference on Image Processing.

[5]  Veronica Sundstedt,et al.  Possibilities and challenges with eye tracking in video games and virtual reality applications , 2016, SIGGRAPH ASIA Courses.

[6]  Joohwan Kim,et al.  NVGaze: An Anatomically-Informed Dataset for Low-Latency, Near-Eye Gaze Estimation , 2019, CHI.

[7]  Marcus Nyström,et al.  The impact of slippage on the data quality of head-worn eye trackers , 2020, Behavior Research Methods.

[8]  Taghi M. Khoshgoftaar,et al.  A survey on Image Data Augmentation for Deep Learning , 2019, Journal of Big Data.

[9]  Jiayi Luo,et al.  Accelerated exhaustive eye glints localization method for infrared video oculography , 2018, SAC.

[10]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Xindong Wu,et al.  Object Detection With Deep Learning: A Review , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[12]  Seyed-Ahmad Ahmadi,et al.  DeepVOG: Open-source pupil segmentation and gaze estimation in neuroscience using deep learning , 2019, Journal of Neuroscience Methods.

[13]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[14]  Yusuke Sugano,et al.  Labelled pupils in the wild: a dataset for studying pupil detection in unconstrained environments , 2015, ETRA.

[15]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[17]  Gjergji Kasneci,et al.  PupilNet: Convolutional Neural Networks for Robust Pupil Detection , 2016, ArXiv.

[18]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Richard Socher,et al.  Improving Generalization Performance by Switching from Adam to SGD , 2017, ArXiv.

[20]  Peter D. Lawrence,et al.  Improving the Accuracy and Reliability of Remote System-Calibration-Free Eye-Gaze Tracking , 2009, IEEE Transactions on Biomedical Engineering.

[21]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[22]  Moshe Eizenman,et al.  General theory of remote gaze estimation using the pupil center and corneal reflections , 2006, IEEE Transactions on Biomedical Engineering.

[23]  Wojciech Czarnecki,et al.  On Loss Functions for Deep Neural Networks in Classification , 2017, ArXiv.

[24]  Seyed-Ahmad Ahmadi,et al.  V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[25]  Sebastian Ruder,et al.  An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[26]  Andreas Bulling,et al.  Pupil: an open source platform for pervasive eye tracking and mobile gaze-based interaction , 2014, UbiComp Adjunct.

[27]  Jonathan Rose,et al.  Hybrid Eye-Tracking on a Smartphone with CNN Feature Extraction and an Infrared 3D Model , 2020, Sensors.

[28]  Gregory Hughes,et al.  OpenEDS: Open Eye Dataset , 2019, ArXiv.

[29]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[30]  Yunchao Wei,et al.  Revisiting Dilated Convolution: A Simple Approach for Weakly- and Semi-Supervised Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  A. Sufian,et al.  Evolution of Image Segmentation using Deep Convolutional Neural Network: A Survey , 2020, Knowl. Based Syst..

[32]  Dan Witzner Hansen,et al.  Robust glint detection through homography normalization , 2014, ETRA.