Noise-Robust Pupil Center Detection Through CNN-Based Segmentation With Shape-Prior Loss

Detecting the pupil center plays a key role in human-computer interaction, especially for gaze tracking. The conventional deep learning-based method for this problem is to train a convolutional neural network (CNN), which takes the eye image as the input and gives the pupil center as a regression result. In this paper, we propose an indirect use of the CNN for the task, which first segments the pupil region by a CNN as a classification problem, and then finds the center of the segmented region. This is based on the observation that CNN works more robustly for the pupil segmentation than for the pupil center-point regression when the inputs are noisy IR images. Specifically, we use the UNet model for the segmentation of pupil regions in IR images and then find the pupil center as the center of mass of the segment. In designing the loss function for the segmentation, we propose a new loss term that encodes the convex shape-prior for enhancing the robustness to noise. Precisely, we penalize not only the deviation of each predicted pixel from the ground truth label but also the non-convex shape of pupils caused by the noise and reflection. For the training, we make a new dataset of 111,581 images with hand-labeled pupil regions from 29 IR eye video sequences. We also label commonly used datasets (ExCuSe and ElSe dataset) that are considered real-world noisy ones to validate our method. Experiments show that the proposed method performs better than the conventional methods that directly find the pupil center as a regression result.

[1]  Guosheng Lin,et al.  Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  John Morgan,et al.  Emulation of Physician Tasks in Eye-Tracked Virtual Reality for Remote Diagnosis of Neurodegenerative Disease , 2017, IEEE Transactions on Visualization and Computer Graphics.

[3]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[4]  Marc L. Resnick,et al.  Ergonomics Society Annual Meeting , 2013 .

[5]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Päivi Majaranta,et al.  Eye Tracking and Eye-Based Human–Computer Interaction , 2014 .

[7]  Nikhil Ketkar,et al.  Introduction to PyTorch , 2021, Deep Learning with Python.

[8]  Takeshi Saitoh,et al.  CNN-Based Pupil Center Detection for Wearable Gaze Estimation System , 2017, Appl. Comput. Intell. Soft Comput..

[9]  Andreas Bulling,et al.  Pupil: an open source platform for pervasive eye tracking and mobile gaze-based interaction , 2014, UbiComp Adjunct.

[10]  H. J. Wyatt The form of the human pupil , 1995, Vision Research.

[11]  Wilbert G. Aguilar,et al.  Driver Fatigue Detection Based on Real-Time Eye Gaze Pattern Analysis , 2017, ICIRA.

[12]  Seunghoon Hong,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[13]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[14]  X. Drèze,et al.  Internet advertising: Is anybody watching? , 2003 .

[15]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[16]  Takeshi Saitoh,et al.  Pupil center detection for infrared irradiation eye image using CNN , 2017, 2017 56th Annual Conference of the Society of Instrument and Control Engineers of Japan (SICE).

[17]  Sang Yoon Han,et al.  Gaze estimation using 3-D eyeball model and eyelid shapes , 2016, 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).

[18]  Wolfgang Rosenstiel,et al.  ExCuSe: Robust Pupil Detection in Real-World Scenarios , 2015, CAIP.

[19]  An Wang,et al.  Pedestrian tracking in surveillance video based on modified CNN , 2018, Multimedia Tools and Applications.

[20]  Thiago Santini,et al.  ElSe: ellipse selection for robust pupil detection in real-world environments , 2015, ETRA.

[21]  Hideyuki Tamura,et al.  Gaze-directed adaptive rendering for interacting with virtual space , 1996, Proceedings of the IEEE 1996 Virtual Reality Annual International Symposium.

[22]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  A. Bulling,et al.  Pupil detection in the wild : An evaluation of the state of the art in mobile head-mounted eye tracking , 2016 .

[24]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Jie Lin,et al.  Towards Detection of Bus Driver Fatigue Based on Robust Visual Analysis of Eye State , 2017, IEEE Transactions on Intelligent Transportation Systems.

[26]  Jong-Soo Choi,et al.  Design and implementation of an augmented reality system using gaze interaction , 2011, Multimedia Tools and Applications.

[27]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[28]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Ghassan Hamarneh,et al.  Star Shape Prior in Fully Convolutional Networks for Skin Lesion Segmentation , 2018, MICCAI.

[30]  Sang Yoon Han,et al.  Gaze estimation using 3-D eyeball model under HMD circumstance , 2017, 2017 IEEE 19th International Workshop on Multimedia Signal Processing (MMSP).

[31]  Gjergji Kasneci,et al.  PupilNet: Convolutional Neural Networks for Robust Pupil Detection , 2016, ArXiv.

[32]  Yuri Boykov,et al.  Normalized Cut Loss for Weakly-Supervised CNN Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Luca Maria Gambardella,et al.  Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images , 2012, NIPS.

[34]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[35]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[36]  Peter Corcoran,et al.  A Review and Analysis of Eye-Gaze Estimation Systems, Algorithms and Performance Evaluation Methods in Consumer Platforms , 2017, IEEE Access.

[37]  Sang Yoon Han,et al.  Pupil Center Detection Based on the UNet for the User Interaction in VR and AR Environments , 2019, 2019 IEEE Conference on Virtual Reality and 3D User Interfaces (VR).

[38]  Dongheng Li,et al.  Starburst: A hybrid algorithm for video-based eye tracking combining feature-based and model-based approaches , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[39]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[41]  Andrew Zisserman,et al.  Deep Convolutional Neural Networks for Efficient Pose Estimation in Gesture Videos , 2014, ACCV.

[42]  Marie-Pierre Jolly,et al.  Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[43]  Jianzhong Wang,et al.  2D Gaze Estimation Based on Pupil-Glint Vector Using an Artificial Neural Network , 2016 .

[44]  Bingrong Hong,et al.  Accurate Pupil Features Extraction Based on New Projection Function , 2012, Comput. Informatics.