Mixed Effects Neural Networks (MeNets) With Applications to Gaze Estimation

There is much interest in computer vision to utilize commodity hardware for gaze estimation. A number of papers have shown that algorithms based on deep convolutional architectures are approaching accuracies where streaming data from mass-market devices can offer good gaze tracking performance, although a gap still remains between what is possible and the performance users will expect in real deployments. We observe that one obvious avenue for improvement relates to a gap between some basic technical assumptions behind most existing approaches and the statistical properties of the data used for training. Specifically, most training datasets involve tens of users with a few hundreds (or more) repeated acquisitions per user. The non i.i.d. nature of this data suggests better estimation may be possible if the model explicitly made use of such “repeated measurements” from each user as is commonly done in classical statistical analysis using so-called mixed effects models. The goal of this paper is to adapt these “mixed effects” ideas from statistics within a deep neural network architecture for gaze estimation, based on eye images. Such a formulation seeks to specifically utilize information regarding the hierarchical structure of the training data — each node in the hierarchy is a user who provides tens or hundreds of repeated samples. This modification yields an architecture that offers state of the art performance on various publicly available datasets improving results by 10-20%.

[1]  Wu Hulin,et al.  Nonparametric Regression Methods for Longitudinal Data Analysis: Mixed-Effects Modeling Approaches , 2006 .

[2]  Wangjiang Zhu,et al.  Monocular Free-Head 3D Gaze Tracking with Deep Learning and Geometry Constraints , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[3]  Geoffrey E. Hinton,et al.  The Recurrent Temporal Restricted Boltzmann Machine , 2008, NIPS.

[4]  Denis Larocque,et al.  Mixed-effects random forest for clustered data , 2014 .

[5]  Takahiro Okabe,et al.  Learning gaze biases with head motion for head pose-free gaze estimation , 2014, Image Vis. Comput..

[6]  Andrew Blake,et al.  Sparse and Semi-supervised Visual Mapping with the S^3GP , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[7]  Sheng-Wen Shih,et al.  A novel approach to 3-D gaze tracking using stereo cameras , 2004, IEEE Trans. Syst. Man Cybern. Part B.

[8]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[9]  Kim M. Dalton,et al.  Gaze fixation and the neural circuitry of face processing in autism , 2005, Nature Neuroscience.

[10]  Gang Liu,et al.  A Differential Approach for Gaze Estimation with Calibration , 2018, BMVC.

[11]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[12]  Miguel A. Velasco,et al.  Accuracy and Precision of the Tobii X2-30 Eye-tracking under Non Ideal Conditions , 2014, NEUROTECHNIX.

[13]  Jarrod Had MCMC Methods for Multi-Response Generalized Linear Mixed Models: The MCMCglmm R Package , 2010 .

[14]  Eric R. Ziegel,et al.  Multivariate Statistical Modelling Based on Generalized Linear Models , 2002, Technometrics.

[15]  Ciprian M. Crainiceanu,et al.  Nonparametric Regression Methods for Longitudinal Data Analysis. Mixed-effects Modeling Approaches , 2007 .

[16]  Pingmei Xu,et al.  GazeGAN - Unpaired Adversarial Image Generation for Gaze Estimation , 2017, ArXiv.

[17]  Otmar Hilliges,et al.  Learning to find eye region landmarks for remote gaze estimation in unconstrained settings , 2018, ETRA.

[18]  Jon Driver,et al.  Seen Gaze-Direction Modulates Fusiform Activity and Its Coupling with Other Brain Areas during Face Processing , 2001, NeuroImage.

[19]  Oleg V. Komogortsev,et al.  Real-time eye gaze tracking with an unmodified commodity webcam employing a neural network , 2010, CHI Extended Abstracts.

[20]  Narendra Ahuja,et al.  Appearance-based eye gaze estimation , 2002, Sixth IEEE Workshop on Applications of Computer Vision, 2002. (WACV 2002). Proceedings..

[21]  Yong Jae Lee,et al.  Discovering important people and objects for egocentric video summarization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  G. Robinson That BLUP is a Good Thing: The Estimation of Random Effects , 1991 .

[23]  Jean-Marc Odobez,et al.  Gaze estimation from multimodal Kinect data , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[24]  Timo Schneider,et al.  Manifold Alignment for Person Independent Appearance-Based Gaze Estimation , 2014, 2014 22nd International Conference on Pattern Recognition.

[25]  Moshe Eizenman,et al.  General theory of remote gaze estimation using the pupil center and corneal reflections , 2006, IEEE Transactions on Biomedical Engineering.

[26]  Mario Fritz,et al.  Appearance-based gaze estimation in the wild , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Nicu Sebe,et al.  Combining Head Pose and Eye Location Information for Gaze Estimation , 2012, IEEE Transactions on Image Processing.

[28]  Wojciech Matusik,et al.  Eye Tracking for Everyone , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Mario Fritz,et al.  MPIIGaze: Real-World Dataset and Deep Appearance-Based Gaze Estimation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Sterling C. Johnson,et al.  When can Multi-Site Datasets be Pooled for Regression? Hypothesis Tests, $\ell_2$-consistency and Neuroscience Applications , 2017, ICML.

[31]  R. Dolan,et al.  Psychology: Reward value of attractiveness and gaze , 2001, Nature.

[32]  Sterling C. Johnson,et al.  Statistical tests and identifiability conditions for pooling and analyzing multisite datasets , 2018, Proceedings of the National Academy of Sciences.

[33]  Mario Fritz,et al.  It’s Written All Over Your Face: Full-Face Appearance-Based Gaze Estimation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[34]  Li-Qun Xu,et al.  A Novel Approach to Real-time Non-intrusive Gaze Finding , 1998, BMVC.

[35]  In-So Kweon,et al.  Appearance-based gaze estimation using kinect , 2013, 2013 10th International Conference on Ubiquitous Robots and Ambient Intelligence (URAI).

[36]  Stuart Barber,et al.  All of Statistics: a Concise Course in Statistical Inference , 2005 .

[37]  Takahiro Okabe,et al.  Inferring human gaze from appearance via adaptive linear regression , 2011, 2011 International Conference on Computer Vision.

[38]  Peter Robinson,et al.  OpenFace: An open source facial behavior analysis toolkit , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[39]  Jean-Marc Odobez,et al.  EYEDIAP: a database for the development and evaluation of gaze estimation algorithms from RGB and RGB-D cameras , 2014, ETRA.

[40]  Andreas Bulling,et al.  EyeTab: model-based gaze estimation on unmodified tablet computers , 2014, ETRA.

[41]  Yaser Sheikh,et al.  Gaze-Driven Video Re-Editing , 2015, TOGS.

[42]  Robert L. Bailey,et al.  Nonlinear Mixed Effects Modeling for Slash Pine Dominant Height Growth Following Intensive Silvicultural Treatments , 2001 .

[43]  R. Baayen,et al.  Mixed-effects modeling with crossed random effects for subjects and items , 2008 .

[44]  Shaun J. Canavan,et al.  A Multi-Gesture Interaction System Using a 3-D Iris Disk Model for Gaze Estimation and an Active Appearance Model for 3-D Hand Pointing , 2011, IEEE Transactions on Multimedia.

[45]  G. McCarthy,et al.  Neural basis of eye gaze processing deficits in autism. , 2005, Brain : a journal of neurology.

[46]  James M. Rehg,et al.  Gaze-enabled egocentric video summarization via constrained submodular maximization , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Tomas Pfister,et al.  Learning from Simulated and Unsupervised Images through Adversarial Training , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  D. Bates,et al.  Nonlinear mixed effects models for repeated measures data. , 1990, Biometrics.

[49]  Anthony J. Robinson,et al.  An application of recurrent nets to phone probability estimation , 1994, IEEE Trans. Neural Networks.

[50]  Wojciech Zaremba,et al.  Recurrent Neural Network Regularization , 2014, ArXiv.

[51]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[52]  Shumeet Baluja,et al.  Non-Intrusive Gaze Tracking Using Artificial Neural Networks , 1993, NIPS.

[53]  Takahiro Okabe,et al.  Head pose-free appearance-based gaze sensing via eye image synthesis , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[54]  J. Ware,et al.  Random-effects models for longitudinal data. , 1982, Biometrics.

[55]  Li Shigang,et al.  Eye-Model-Based Gaze Estimation by RGB-D Camera , 2014, CVPR 2014.

[56]  Vikas Singh,et al.  Riemannian Nonlinear Mixed Effects Models: Analyzing Longitudinal Deformations in Neuroimaging , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Yaser Sheikh,et al.  Predicting Primary Gaze Behavior Using Social Saliency Fields , 2013, 2013 IEEE International Conference on Computer Vision.

[58]  James M. Rehg,et al.  Learning to Predict Gaze in Egocentric Video , 2013, 2013 IEEE International Conference on Computer Vision.

[59]  Eugene Demidenko,et al.  Mixed Models: Theory and Applications with R , 2013 .

[60]  Yoichi Sato,et al.  Learning-by-Synthesis for Appearance-Based 3D Gaze Estimation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[61]  Takahiro Okabe,et al.  Adaptive Linear Regression for Appearance-Based Gaze Estimation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.