Multiview Multitask Gaze Estimation With Deep Convolutional Neural Networks

Gaze estimation, which aims to predict gaze points with given eye images, is an important task in computer vision because of its applications in human visual attention understanding. Many existing methods are based on a single camera, and most of them only focus on either the gaze point estimation or gaze direction estimation. In this paper, we propose a novel multitask method for the gaze point estimation using multiview cameras. Specifically, we analyze the close relationship between the gaze point estimation and gaze direction estimation, and we use a partially shared convolutional neural networks architecture to simultaneously estimate the gaze direction and gaze point. Furthermore, we also introduce a new multiview gaze tracking data set that consists of multiview eye images of different subjects. As far as we know, it is the largest multiview gaze tracking data set. Comprehensive experiments on our multiview gaze tracking data set and existing data sets demonstrate that our multiview multitask gaze point estimation solution consistently outperforms existing methods.

[1]  Jianfei Cai,et al.  Beyond pixels: A comprehensive survey from bottom-up to semantic image segmentation and cosegmentation , 2015, J. Vis. Commun. Image Represent..

[2]  Nicu Sebe,et al.  Combining Head Pose and Eye Location Information for Gaze Estimation , 2012, IEEE Transactions on Image Processing.

[3]  Leonidas J. Guibas,et al.  Volumetric and Multi-view CNNs for Object Classification on 3D Data , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Qiang Ji,et al.  3D gaze estimation with a single camera without IR illumination , 2008, 2008 19th International Conference on Pattern Recognition.

[6]  Peter Robinson,et al.  Rendering of Eyes for Eye-Shape Registration and Gaze Estimation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[7]  Subhransu Maji,et al.  Multi-view Convolutional Neural Networks for 3D Shape Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[8]  Carlos Hitoshi Morimoto,et al.  Eye gaze tracking techniques for interactive applications , 2005, Comput. Vis. Image Underst..

[9]  Brian Kingsbury,et al.  New types of deep neural network learning for speech recognition and related applications: an overview , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Qiong Huang,et al.  TabletGaze: Unconstrained Appearance-based Gaze Estimation in Mobile Tablets , 2015 .

[11]  Luc Van Gool,et al.  Temporal Segment Networks: Towards Good Practices for Deep Action Recognition , 2016, ECCV.

[12]  Yu Zhang,et al.  A Survey on Multi-Task Learning , 2017, IEEE Transactions on Knowledge and Data Engineering.

[13]  Mario Fritz,et al.  It’s Written All Over Your Face: Full-Face Appearance-Based Gaze Estimation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[14]  Matti Pietikäinen,et al.  OMEG: Oulu Multi-Pose Eye Gaze Dataset , 2015, SCIA.

[15]  Dacheng Tao,et al.  A Survey on Multi-view Learning , 2013, ArXiv.

[16]  Yoichi Sato,et al.  Appearance-Based Gaze Estimation Using Visual Saliency , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Dan Witzner Hansen,et al.  Eye tracking in the wild , 2005, Comput. Vis. Image Underst..

[18]  Oleg V. Komogortsev,et al.  Real-time eye gaze tracking with an unmodified commodity webcam employing a neural network , 2010, CHI Extended Abstracts.

[19]  Narendra Ahuja,et al.  Appearance-based eye gaze estimation , 2002, Sixth IEEE Workshop on Applications of Computer Vision, 2002. (WACV 2002). Proceedings..

[20]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[21]  Peter D. Lawrence,et al.  A single camera eye-gaze tracking system with free head motion , 2006, ETRA.

[22]  Steven K. Feiner,et al.  Gaze locking: passive eye contact detection for human-object interaction , 2013, UIST.

[23]  Shumeet Baluja,et al.  Non-Intrusive Gaze Tracking Using Artificial Neural Networks , 1993, NIPS.

[24]  Takahiro Ishikawa,et al.  Passive driver gaze tracking with active appearance models , 2004 .

[25]  Daniel Thalmann,et al.  Robust 3D Hand Pose Estimation in Single Depth Images: From Single-View CNN to Multi-View CNNs , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Zhiwei Zhu,et al.  Eye gaze tracking under natural head movements , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[27]  K. Rayner Eye movements in reading and information processing: 20 years of research. , 1998, Psychological bulletin.

[28]  Shiliang Sun,et al.  Multi-view learning overview: Recent progress and new challenges , 2017, Inf. Fusion.

[29]  Naoki Mukawa,et al.  FreeGaze: a gaze tracking system for everyday gaze interaction , 2002, ETRA.

[30]  Tommy Strandvall,et al.  Eye Tracking in Human-Computer Interaction and Usability Research , 2009, INTERACT.

[31]  Takahiro Okabe,et al.  Inferring human gaze from appearance via adaptive linear regression , 2011, 2011 International Conference on Computer Vision.

[32]  Shijian Lu,et al.  YoTube: Searching Action Proposal Via Recurrent and Static Regression Networks , 2017, IEEE Transactions on Image Processing.

[33]  Sebastian Ruder,et al.  An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.

[34]  Qiang Ji,et al.  In the Eye of the Beholder: A Survey of Models for Eyes and Gaze , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Myung Jin Chung,et al.  A novel non-intrusive eye gaze estimation using cross-ratio under large head motion , 2005, Comput. Vis. Image Underst..

[36]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Takahiro Okabe,et al.  Learning gaze biases with head motion for head pose-free gaze estimation , 2014, Image Vis. Comput..

[38]  Wojciech Matusik,et al.  Eye Tracking for Everyone , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Mario Fritz,et al.  Appearance-based gaze estimation in the wild , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[41]  Zhuowen Tu,et al.  Deeply-Supervised Nets , 2014, AISTATS.

[42]  Heiko Neumann,et al.  A comprehensive head pose and gaze database , 2007 .

[43]  Zhiwei Zhu,et al.  Nonlinear Eye Gaze Mapping Function Estimation via Support Vector Regression , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[44]  Jean-Marc Odobez,et al.  EYEDIAP: a database for the development and evaluation of gaze estimation algorithms from RGB and RGB-D cameras , 2014, ETRA.

[45]  Jean-Marc Odobez,et al.  Person independent 3D gaze estimation from remote RGB-D cameras , 2013, 2013 IEEE International Conference on Image Processing.

[46]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Andrew Blake,et al.  Sparse and Semi-supervised Visual Mapping with the S^3GP , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[48]  Zhang Yi,et al.  A Unified Framework for Representation-Based Subspace Clustering of Out-of-Sample and Large-Scale Data , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[49]  G. Prasad EYE TRACKING AND EYE-BASED HUMAN – COMPUTER INTERACTION , 2016 .

[50]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[51]  Yoichi Sato,et al.  Learning-by-Synthesis for Appearance-Based 3D Gaze Estimation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  Takahiro Okabe,et al.  Adaptive Linear Regression for Appearance-Based Gaze Estimation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.