Automatic Recognition and Augmentation of Attended Objects in Real-time using Eye Tracking and a Head-mounted Display

Scanning and processing visual stimuli in a scene is essential for the human brain to make situation-aware decisions. Adding the ability to observe the scanning behavior and scene processing to intelligent mobile user interfaces can facilitate a new class of cognition-aware user interfaces. As a first step in this direction, we implement an augmented reality (AR) system that classifies objects at the user’s point of regard, detects visual attention to them, and augments the real objects with virtual labels that stick to the objects in real-time. We use a head-mounted AR device (Microsoft HoloLens 2) with integrated eye tracking capabilities and a front-facing camera for implementing our prototype.

[1]  Daniel Sonntag,et al.  Towards Episodic Memory Support for Dementia Patients by Recognizing Objects, Faces and Text in Eye Gaze , 2015, KI.

[2]  Mario Fritz,et al.  Predicting the Category and Attributes of Visual Search Targets Using Deep Gaze Pooling , 2016, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[3]  Thomas Kieninger,et al.  Gaze guided object recognition using a head-mounted eye tracker , 2012, ETRA '12.

[4]  Sharon L. Oviatt,et al.  Ten Opportunities and Challenges for Advancing Student-Centered Multimodal Learning Analytics , 2018, ICMI.

[5]  M. Hayhoe,et al.  In what ways do eye movements contribute to everyday activities? , 2001, Vision Research.

[6]  Daniel Sonntag,et al.  Visual Search Target Inference in Natural Interaction Settings with Machine Learning , 2020, ETRA.

[7]  Koki Ijuin Eye-gaze in Social Robot Interactions , 2019 .

[8]  Daniel Sonntag,et al.  Gaze-guided object classification using deep neural networks for attention-based computing , 2016, UbiComp Adjunct.

[9]  Daniel Sonntag,et al.  Kognit: Intelligent Cognitive Enhancement Technology by Cognitive Models and Mixed Reality for Dementia Patients , 2015, AAAI Fall Symposia.

[10]  Kursat Cagiltay,et al.  A systematic review of eye tracking research on multimedia learning , 2018, Comput. Educ..

[11]  Mirko Meboldt,et al.  Automating Areas of Interest Analysis in Mobile Eye Tracking Experiments based on Machine Learning , 2018, Journal of eye movement research.

[12]  Aleksandra Kaszowska,et al.  Software Architecture for Automating Cognitive Science Eye-Tracking Data Analysis and Object Annotation , 2019, IEEE Transactions on Human-Machine Systems.

[13]  Ali Borji,et al.  State-of-the-Art in Visual Attention Modeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Dylan D. Schmorrow,et al.  Foundations of Augmented Cognition. Advancing Human Performance and Decision-Making through Adaptive Systems , 2014, Lecture Notes in Computer Science.

[15]  Daniel Sonntag,et al.  ARETT: Augmented Reality Eye Tracking Toolkit for Head Mounted Displays , 2021, Sensors.

[16]  Patrick Gebhard,et al.  Exploring a Model of Gaze for Grounding in Multimodal HRI , 2014, ICMI.

[17]  Daniel Sonntag ERmed - Towards Medical Multimodal Cyber-Physical Environments , 2014, HCI.

[18]  Paul Lukowicz,et al.  Effects of augmented reality on learning and cognitive load in university physics laboratory courses , 2020, Comput. Hum. Behav..

[19]  Daniel Sonntag,et al.  Using Eye-Gaze and Visualization to Augment Memory - A Framework for Improving Context Recognition and Recall , 2014, HCI.

[20]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[21]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Mary M Hayhoe,et al.  Task and context determine where you look. , 2016, Journal of vision.

[23]  Paulo Blikstein,et al.  Multimodal learning analytics , 2013, LAK '13.

[24]  Cynthia Matuszek,et al.  Grounded Language Learning: Where Robotics and NLP Meet , 2018, IJCAI.

[25]  Andreas Bulling,et al.  Pervasive Attentive User Interfaces , 2016, Computer.

[26]  Cristina Conati,et al.  Comparing and Combining Interaction Data and Eye-tracking Data for the Real-time Prediction of User Cognitive Abilities in Visualization Tasks , 2020, ACM Trans. Interact. Intell. Syst..

[27]  Sebastian Kapp,et al.  The Effects of Augmented Reality: A Comparative Study in an Undergraduate Physics Laboratory Course , 2020, CSEDU.