Predicting Intention Through Eye Gaze Patterns

Eye movement is a valuable (and in several cases, the only remaining) means of communication for impaired people with extremely limited motor or communication capabilities. In this paper, we present a new framework that utilizes eye gaze patterns as input, to predict user's intention for performing daily tasks. The proposed framework consists of two main modules. First, by clustering the eye gaze patterns, the regions of interest (ROIs) on the displayed image are extracted. A deep convolutional neural network is then trained and used to recognize the objects in each ROI. Finally, the intended task is predicted by using support vector machine (SVM) through learning the embedded relationship between recognized objects. The proposed framework is tested using data from 8 subjects, in an experiment considering 4 intended tasks as well as the scenario in which the user does not have a specific intention when looking at the displayed image. Results demonstrate an average accuracy of 95.68% across all tasks, confirming the efficacy of the proposed framework.

[1]  James J. Clark,et al.  Visual Task Inference Using Hidden Markov Models , 2011, IJCAI.

[2]  Siddhartha S. Srinivasa,et al.  Predicting User Intent Through Eye Gaze for Shared Autonomy , 2016, AAAI Fall Symposia.

[3]  Garrison W. Cottrell,et al.  Predicting an observer's task using multi-fixation pattern analysis , 2014, ETRA.

[4]  Ziho Kang,et al.  Characterization of Visual Scanning Patterns in Air Traffic Control , 2016, Comput. Intell. Neurosci..

[5]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[7]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[8]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[9]  Antoine Coutrot,et al.  Scanpath modeling and classification with hidden Markov models , 2017, Behavior Research Methods.

[10]  Bilge Mutlu,et al.  Anticipatory robot control for efficient human-robot collaboration , 2016, 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[11]  Xiaoli Zhang,et al.  Using visuomotor tendencies to increase control performance in teleoperation , 2016, 2016 American Control Conference (ACC).

[12]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[13]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[14]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Jack Sklansky,et al.  Finding the convex hull of a simple polygon , 1982, Pattern Recognit. Lett..

[16]  Bilge Mutlu,et al.  Using gaze patterns to predict task intent in collaboration , 2015, Front. Psychol..

[17]  Kostas E. Bekris,et al.  Geometric reachability analysis for grasp planning in cluttered scenes for varying end-effectors , 2017, 2017 13th IEEE Conference on Automation Science and Engineering (CASE).

[18]  Michelle R. Greene,et al.  Reconsidering Yarbus: A failure to predict observers’ task from eye movement patterns , 2012, Vision Research.

[19]  Vijayan K. Asari,et al.  The History Began from AlexNet: A Comprehensive Survey on Deep Learning Approaches , 2018, ArXiv.

[20]  Yurong Liu,et al.  A survey of deep neural network architectures and their applications , 2017, Neurocomputing.