A deep learning approach for robust head pose independent eye movements recognition from videos

Recognizing eye movements is important for gaze behavior understanding like in human communication analysis (human-human or robot interactions) or for diagnosis (medical, reading impairments). In this paper, we address this task using remote RGB-D sensors to analyze people behaving in natural conditions. This is very challenging given that such sensors have a normal sampling rate of 30 Hz and provide low-resolution eye images (typically 36×60 pixels), and natural scenarios introduce many variabilities in illumination, shadows, head pose, and dynamics. Hence gaze signals one can extract in these conditions have lower precision compared to dedicated IR eye trackers, rendering previous methods less appropriate for the task. To tackle these challenges, we propose a deep learning method that directly processes the eye image video streams to classify them into fixation, saccade, and blink classes, and allows to distinguish irrelevant noise (illumination, low-resolution artifact, inaccurate eye alignment, difficult eye shapes) from true eye motion signals. Experiments on natural 4-party interactions demonstrate the benefit of our approach compared to previous methods, including deep learning models applied to gaze outputs.

[1]  Gang Liu,et al.  A Differential Approach for Gaze Estimation with Calibration , 2018, BMVC.

[2]  Otmar Hilliges,et al.  Deep Pictorial Gaze Estimation , 2018, ECCV.

[3]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[4]  Thiago Santini,et al.  Bayesian identification of fixations, saccades, and smooth pursuits , 2015, ETRA.

[5]  Davis E. King,et al.  Dlib-ml: A Machine Learning Toolkit , 2009, J. Mach. Learn. Res..

[6]  Mario Fritz,et al.  It’s Written All Over Your Face: Full-Face Appearance-Based Gaze Estimation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[7]  Jean-Marc Odobez,et al.  Robust and Accurate 3D Head Pose Estimation through 3DMM and Online Head Model Reconstruction , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[8]  Otto Lappi,et al.  A new and general approach to signal denoising and eye movement classification based on segmented linear regression , 2017, Scientific Reports.

[9]  Ziad M Hafed,et al.  Human-level saccade detection performance using deep neural networks. , 2019, Journal of neurophysiology.

[10]  Giacomo Veneri,et al.  Automatic eye fixations identification based on analysis of variance and covariance , 2011, Pattern Recognition Letters.

[11]  Oleg V. Komogortsev,et al.  Using machine learning to detect events in eye-tracking data , 2018, Behavior research methods.

[12]  Andreas Bulling,et al.  End-to-End Eye Movement Detection Using Convolutional Neural Networks , 2016, ArXiv.

[13]  Diederick C. Niehorster,et al.  Noise-robust fixation detection in eye movement data: Identification by two-means clustering (I2MC) , 2016, Behavior Research Methods.

[14]  Jean-Marc Odobez,et al.  Who Will Get the Grant?: A Multimodal Corpus for the Analysis of Conversational Behaviours in Group Interviews , 2014, UM3I '14.

[15]  Wojciech Matusik,et al.  Eye Tracking for Everyone , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Marcus Nyström,et al.  Detection of fixations and smooth pursuit movements in high-speed eye-tracking data , 2015, Biomed. Signal Process. Control..

[17]  Jean-Marc Odobez,et al.  Gaze Estimation in the 3D Space Using RGB-D Sensors , 2015, International Journal of Computer Vision.

[18]  Peter Robinson,et al.  Constrained Local Neural Fields for Robust Facial Landmark Detection in the Wild , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[19]  Joseph H. Goldberg,et al.  Identifying fixations and saccades in eye-tracking protocols , 2000, ETRA.

[20]  Jean-Marc Odobez,et al.  Facing Employers and Customers: What Do Gaze and Expressions Tell About Soft Skills? , 2018, MUM.

[21]  Marcus Nyström,et al.  One algorithm to rule them all? An evaluation and discussion of ten eye movement event-detection algorithms , 2016, Behavior Research Methods.

[22]  Mario Fritz,et al.  Appearance-based gaze estimation in the wild , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Marcus Nyström,et al.  Detection of Saccades and Postsaccadic Oscillations in the Presence of Smooth Pursuit , 2013, IEEE Transactions on Biomedical Engineering.

[24]  Jean-Marc Odobez,et al.  Engagement-based Multi-party Dialog with a Humanoid Robot , 2011, SIGDIAL Conference.

[25]  Gang Liu,et al.  Deep Multitask Gaze Estimation with a Constrained Landmark-Gaze Model , 2018, ECCV Workshops.

[26]  Jean-Marc Odobez,et al.  Multi-party focus of attention recognition in meetings from head pose and multimodal contextual cues , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[27]  Louis-Philippe Morency,et al.  OpenFace 2.0: Facial Behavior Analysis Toolkit , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[28]  Josephine Sullivan,et al.  One millisecond face alignment with an ensemble of regression trees , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  M. Rinck,et al.  Shorter gaze duration for happy faces in current but not remitted depression: Evidence from eye movements , 2014, Psychiatry Research.

[30]  David R. Bull,et al.  Fixation identification for low-sample-rate mobile eye trackers , 2016, 2016 IEEE International Conference on Image Processing (ICIP).