Automated capture and delivery of assistive task guidance with an eyewear computer: the GlaciAR system

In this paper we describe and evaluate an assistive mixed reality system that aims to augment users in tasks by combining automated and unsupervised information collection with minimally invasive video guides. The result is a fully self-contained system that we call GlaciAR (Glass-enabled Contextual Interactions for Augmented Reality). It operates by extracting contextual interactions from observing users performing actions. GlaciAR is able to i) automatically determine moments of relevance based on a head motion attention model, ii) automatically produce video guidance information, iii) trigger these guides based on an object detection method, iv) learn without supervision from observing multiple users and v) operate fully on-board a current eyewear computer (Google Glass). We describe the components of GlaciAR together with user evaluations on three tasks. We see this work as a first step toward scaling up the notoriously difficult authoring problem in guidance systems and an exploration of enhancing user natural abilities via minimally invasive visual cues.

[1]  Dima Damen,et al.  Real-time Learning and Detection of 3D Texture-less Objects: A Scalable Approach , 2012, BMVC 2012.

[2]  Dima Damen,et al.  You-Do, I-Learn: Discovering Task Relevant Objects and their Modes of Interaction from Multi-User Egocentric Video , 2014, BMVC.

[3]  Dima Damen,et al.  Integrating 3D object detection, modelling and tracking on a mobile phone , 2012, 2012 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[4]  Steven K. Feiner,et al.  Knowledge-based augmented reality , 1993, CACM.

[5]  Dima Damen,et al.  Estimating visual attention from a head mounted IMU , 2015, SEMWEB.

[6]  Naohiko Khotake InfoStick : An Interaction Device for Inter-Appliance Computing , 1999 .

[7]  Didier Stricker,et al.  Real-time modeling and tracking manual workflows from first-person vision , 2013, 2013 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[8]  Joseph H. Goldberg,et al.  Identifying fixations and saccades in eye-tracking protocols , 2000, ETRA.

[9]  P. Milgram,et al.  A Taxonomy of Mixed Reality Visual Displays , 1994 .

[10]  M. Land Eye movements and the control of actions in everyday life , 2006, Progress in Retinal and Eye Research.

[11]  Steven K. Feiner,et al.  Exploring the Benefits of Augmented Reality Documentation for Maintenance and Repair , 2011, IEEE Transactions on Visualization and Computer Graphics.

[12]  Didier Stricker,et al.  Learning task structure from video examples for workflow tracking and authoring , 2012, 2012 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[13]  References , 1971 .

[14]  Mahadev Satyanarayanan,et al.  Early Implementation Experience with Wearable Cognitive Assistance Applications , 2015, WearSys@MobiSys.

[15]  David W. Murray,et al.  Wearable hand activity recognition for event summarization , 2005, Ninth IEEE International Symposium on Wearable Computers (ISWC'05).

[16]  James M. Rehg,et al.  Learning to Predict Gaze in Egocentric Video , 2013, 2013 IEEE International Conference on Computer Vision.

[17]  Yun-Ta Tsai,et al.  Indirect augmented reality , 2011, Comput. Graph..

[18]  Dima Damen,et al.  Cognitive Learning, Monitoring and Assistance of Industrial Workflows Using Egocentric Sensor Networks , 2015, PloS one.