Semantic fovea: real-time annotation of ego-centric videos with gaze context

Visual context plays a crucial role in understanding human visual attention in natural, unconstrained tasks - the objects we look at during everyday tasks provide an indicator of our ongoing attention. Collection, interpretation, and study of visual behaviour in unconstrained environments therefore is necessary, however presents many challenges, requiring painstaking hand-coding. Here we demonstrate a proof-of-concept system that enables real-time annotation of objects in an egocentric video stream from head-mounted eye-tracking glasses. We concurrently obtain a live stream of user gaze vectors with respect to their own visual field. Even during dynamic, fast-paced interactions, our system was able to recognise all objects in the user's field-of-view with moderate accuracy. To validate our concept, our system was used to annotate an in-lab breakfast scenario in real time.

[1]  A. L. Yarbus Eye Movements During Perception of Complex Objects , 1967 .

[2]  D. Ballard,et al.  Eye movements in natural behavior , 2005, Trends in Cognitive Sciences.

[3]  Peter König,et al.  Human eye-head co-ordination in natural exploration , 2007, Network.

[4]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[5]  Mary M Hayhoe,et al.  Task and context determine where you look. , 2016, Journal of vision.

[6]  H. Collewijn,et al.  Precise recording of human eye movements , 1975, Vision Research.

[7]  Sergio Guadarrama,et al.  Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  W W Abbott,et al.  Ultra-low-cost 3D gaze estimation: an intuitive high information throughput compliment to direct brain–machine interfaces , 2012, Journal of neural engineering.

[9]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[10]  M. Hayhoe,et al.  In what ways do eye movements contribute to everyday activities? , 2001, Vision Research.

[11]  A. Aldo Faisal,et al.  Towards free 3D end-point control for robotic-assisted human reaching using binocular eye tracking , 2017, 2017 International Conference on Rehabilitation Robotics (ICORR).

[12]  D. Ballard,et al.  Eye guidance in natural vision: reinterpreting salience. , 2011, Journal of vision.

[13]  D H Ballard,et al.  Hand-eye coordination during sequential tasks. , 1992, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[14]  A. Aldo Faisal,et al.  “Wink to grasp” — comparing eye, voice & EMG gesture control of grasp with soft-robotic gloves , 2017, 2017 International Conference on Rehabilitation Robotics (ICORR).

[15]  M. Land,et al.  The Roles of Vision and Eye Movements in the Control of Activities of Daily Living , 1998, Perception.

[16]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.