Training Person-Specific Gaze Estimators from User Interactions with Multiple Devices

Learning-based gaze estimation has significant potential to enable attentive user interfaces and gaze-based interaction on the billions of camera-equipped handheld devices and ambient displays. While training accurate person- and device-independent gaze estimators remains challenging, person-specific training is feasible but requires tedious data collection for each target device. To address these limitations, we present the first method to train person-specific gaze estimators across multiple devices. At the core of our method is a single convolutional neural network with shared feature extraction layers and device-specific branches that we train from face images and corresponding on-screen gaze locations. Detailed evaluations on a new dataset of interactions with five common devices (mobile phone, tablet, laptop, desktop computer, smart TV) and three common applications (mobile game, text editing, media center) demonstrate the significant potential of cross-device training. We further explore training with gaze locations derived from natural interactions, such as mouse or touch input.

[1]  Qiang Ji,et al.  3D gaze estimation with a single camera without IR illumination , 2008, 2008 19th International Conference on Pattern Recognition.

[2]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[3]  Raimund Dachselt,et al.  Focus Paragraph Detection for Online Zero-Effort Queries: Lessons learned from Eye-Tracking Data , 2017, CHIIR.

[4]  Andreas Bulling,et al.  Pervasive Attentive User Interfaces , 2016, Computer.

[5]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[6]  Lukasz Kaiser,et al.  One Model To Learn Them All , 2017, ArXiv.

[7]  Hong Va Leong,et al.  StressClick: Sensing Stress from Gaze-Click Patterns , 2016, ACM Multimedia.

[8]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[9]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[10]  Jean-Marc Odobez,et al.  EYEDIAP: a database for the development and evaluation of gaze estimation algorithms from RGB and RGB-D cameras , 2014, ETRA.

[11]  Andrew Blake,et al.  Sparse and Semi-supervised Visual Mapping with the S^3GP , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[12]  Hans Gellersen,et al.  Touch Input and Gaze Correlation on Tablets , 2017, KES-IDT.

[13]  Qiang Ji,et al.  Deep eye fixation map learning for calibration-free eye gaze tracking , 2016, ETRA.

[14]  Bohyung Han,et al.  Learning Multi-domain Convolutional Neural Networks for Visual Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Nicu Sebe,et al.  Combining Head Pose and Eye Location Information for Gaze Estimation , 2012, IEEE Transactions on Image Processing.

[16]  Ronald E. Rice,et al.  Digital Divides From Access to Activities: Comparing Mobile and Personal Computer Internet Users , 2013 .

[17]  James Hays,et al.  WebGazer: Scalable Webcam Eye Tracking Using User Interactions , 2016, IJCAI.

[18]  Yusuke Sugano,et al.  Self-Calibrating Head-Mounted Eye Trackers Using Egocentric Visual Saliency , 2015, UIST.

[19]  Davis E. King,et al.  Dlib-ml: A Machine Learning Toolkit , 2009, J. Mach. Learn. Res..

[20]  Mario Fritz,et al.  It’s Written All Over Your Face: Full-Face Appearance-Based Gaze Estimation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[21]  Stephen Chi-fai Chan,et al.  Building a Personalized, Auto-Calibrating Eye Tracker from User Interactions , 2016, CHI.

[22]  Yusuke Sugano,et al.  Everyday Eye Contact Detection Using Unsupervised Gaze Target Discovery , 2017, UIST.

[23]  Peter H. Tu,et al.  Learning person-specific models for facial expression and action unit recognition , 2013, Pattern Recognit. Lett..

[24]  Andreas Bulling,et al.  EyeTab: model-based gaze estimation on unmodified tablet computers , 2014, ETRA.

[25]  Qiang Ji,et al.  Probabilistic gaze estimation without active personal calibration , 2011, CVPR 2011.

[26]  Peter Robinson,et al.  Rendering of Eyes for Eye-Shape Registration and Gaze Estimation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[27]  Takahiro Okabe,et al.  Learning gaze biases with head motion for head pose-free gaze estimation , 2014, Image Vis. Comput..

[28]  Brian Kingsbury,et al.  New types of deep neural network learning for speech recognition and related applications: an overview , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[29]  Qiong Huang,et al.  TabletGaze: Unconstrained Appearance-based Gaze Estimation in Mobile Tablets , 2015 .

[30]  Yoichi Sato,et al.  Appearance-Based Gaze Estimation Using Visual Saliency , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Per Ola Kristensson,et al.  The potential of dwell-free eye-typing for fast assistive gaze communication , 2012, ETRA.

[32]  Tomas Pfister,et al.  Learning from Simulated and Unsupervised Images through Adversarial Training , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Yoichi Sato,et al.  Learning-by-Synthesis for Appearance-Based 3D Gaze Estimation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[35]  Yusuke Sugano,et al.  AggreGaze: Collective Estimation of Audience Attention on Public Displays , 2016, UIST.

[36]  Peter Robinson,et al.  Learning an appearance-based gaze estimator from one million synthesised images , 2016, ETRA.

[37]  Xiaoou Tang,et al.  Facial Landmark Detection by Deep Multi-task Learning , 2014, ECCV.

[38]  Peter Robinson,et al.  Constrained Local Neural Fields for Robust Facial Landmark Detection in the Wild , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[39]  Raymond Lee,et al.  An ergonomic evaluation comparing desktop, notebook, and subnotebook computers. , 2002, Archives of physical medicine and rehabilitation.

[40]  Ryen W. White,et al.  User see, user point: gaze and cursor alignment in web search , 2012, CHI.

[41]  Mario Fritz,et al.  Appearance-based gaze estimation in the wild , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Jean-Marc Odobez,et al.  Gaze Estimation in the 3D Space Using RGB-D Sensors , 2015, International Journal of Computer Vision.

[43]  Yoichi Sato,et al.  Appearance-Based Gaze Estimation With Online Calibration From Mouse Operations , 2015, IEEE Transactions on Human-Machine Systems.

[44]  Narendra Ahuja,et al.  Appearance-based eye gaze estimation , 2002, Sixth IEEE Workshop on Applications of Computer Vision, 2002. (WACV 2002). Proceedings..

[45]  Yoichi Sato,et al.  An Incremental Learning Method for Unconstrained Gaze Estimation , 2008, ECCV.

[46]  Wojciech Matusik,et al.  Eye Tracking for Everyone , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).