Combining first-person and third-person gaze for attention recognition

This paper presents a method to recognize attentional behaviors from a head-mounted binocular eye tracker in triadic interactions. By taking advantage of the first-person view, we simultaneously estimate the first-person and third-person gaze. The first-person gaze is computed using an appearance-based method relying on local features. In parallel, head pose tracking allows determining the coarse gaze of people in the scene camera. Finally, knowing the first- and third-person gaze direction, scores are computed which permit to assign attention patterns to each frame. Our contributions are the followings: (i) head pose estimation based on localized regression, (ii) attention analysis, in particular mutual and shared gaze, including the first-person gaze, (iii) experiments conducted using a head-mounted appearance-based gaze tracker. Experiments on recorded data show encouraging results.

[1]  Andrew Zisserman,et al.  "Here's looking at you, kid". Detecting people looking at each other in videos , 2011, BMVC.

[2]  J. Crowley,et al.  Estimating Face orientation from Robust Detection of Salient Facial Structures , 2004 .

[3]  Aude Billard,et al.  A wearable gaze tracking system for children in unconstrained environments , 2011, Comput. Vis. Image Underst..

[4]  Michael E. Tipping,et al.  Fast Marginal Likelihood Maximisation for Sparse Bayesian Models , 2003 .

[5]  Deva Ramanan,et al.  Detecting activities of daily living in first-person camera views , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[7]  Qiang Ji,et al.  In the Eye of the Beholder: A Survey of Models for Eyes and Gaze , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  M. Argyle,et al.  The Different Functions of Gaze , 1973 .

[9]  Larry S. Davis,et al.  On partial least squares in head pose estimation: How to simultaneously deal with misalignment , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Ting Yu,et al.  What are customers looking at? , 2007, 2007 IEEE Conference on Advanced Video and Signal Based Surveillance.

[11]  Yoichi Sato,et al.  Coupling eye-motion and ego-motion features for first-person activity recognition , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[12]  Takeo Kanade,et al.  Illumination-free gaze estimation method for first-person vision wearable device , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[13]  M. Argyle Social interactions. , 1976, Science.

[14]  Roman Rosipal,et al.  Kernel Partial Least Squares Regression in Reproducing Kernel Hilbert Space , 2002, J. Mach. Learn. Res..

[15]  Ian D. Reid,et al.  Guiding Visual Surveillance by Tracking Human Attention , 2009, BMVC.

[16]  Edwige Pissaloux,et al.  Gaze estimation using local features and non-linear regression , 2012, 2012 19th IEEE International Conference on Image Processing.

[17]  Jean-Marc Odobez,et al.  Recognizing Visual Focus of Attention From Head Pose in Natural Meetings , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[18]  James M. Rehg,et al.  Social interactions: A first-person perspective , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  James M. Rehg,et al.  Learning to Recognize Daily Actions Using Gaze , 2012, ECCV.

[20]  Z. Merali Here's looking at you, kid , 2008, Nature.

[21]  James M. Rehg,et al.  Detecting eye contact using wearable eye-tracking glasses , 2012, UbiComp.

[22]  Sethuraman Panchanathan,et al.  A methodology for evaluating robustness of face recognition algorithms with respect to variations in pose angle and illumination angle , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[23]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[24]  Ming-Hsuan Yang,et al.  Incremental Learning for Robust Visual Tracking , 2008, International Journal of Computer Vision.

[25]  Ethem Alpaydin,et al.  Localized Multiple Kernel Regression , 2010, 2010 20th International Conference on Pattern Recognition.