Multi-User Egocentric Online System for Unsupervised Assistance on Object Usage

We present an online fully unsupervised approach for automatically extracting video guides of how objects are used from wearable gaze trackers worn by multiple users. Given egocentric video and eye gaze from multiple users performing tasks, the system discovers task-relevant objects and automatically extracts guidance videos on how these objects have been used. In the assistive mode, the paper proposes a method for selecting a suitable video guide to be displayed to a novice user indicating how to use an object, purely triggered by the user’s gaze. The approach is tested on a variety of daily tasks ranging from opening a door, to preparing coffee and operating a gym machine.

[1]  Xiaofeng Ren,et al.  Figure-ground segmentation improves handled object recognition in egocentric video , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  B. Tatler,et al.  The Moving Tablet Of The Eye , 2005 .

[3]  Joseph H. Goldberg,et al.  Identifying fixations and saccades in eye-tracking protocols , 2000, ETRA.

[4]  Michael Beetz,et al.  EYEWATCHME—3D Hand and object tracking for inside out activity analysis , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[5]  James M. Rehg,et al.  Learning to Recognize Daily Actions Using Gaze , 2012, ECCV.

[6]  Dima Damen,et al.  You-Do, I-Learn: Discovering Task Relevant Objects and their Modes of Interaction from Multi-User Egocentric Video , 2014, BMVC.

[7]  Tom Drummond,et al.  Real-Time Video Annotations for Augmented Reality , 2005, ISVC.

[8]  J. Henderson Human gaze control during real-world scene perception , 2003, Trends in Cognitive Sciences.

[9]  K. Jellinger The Moving Tablet of the Eye: The Origins of Modern Eye Movement Research , 2006 .

[10]  Walterio W. Mayol-Cuevas,et al.  3D from looking: using wearable gaze tracking for hands-free and feedback-free object modelling , 2013, ISWC '13.

[11]  Tsukasa Ogasawara,et al.  Estimating 3D point-of-regard and visualizing gaze trajectories under natural head movements , 2010, ETRA '10.

[12]  Kristen Grauman,et al.  Story-Driven Summarization for Egocentric Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Dima Damen,et al.  Real-time Learning and Detection of 3D Texture-less Objects: A Scalable Approach , 2012, BMVC.

[14]  G. Klein,et al.  Parallel Tracking and Mapping for Small AR Workspaces , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[15]  M. Land Eye movements and the control of actions in everyday life , 2006, Progress in Retinal and Eye Research.

[16]  Walterio W. Mayol-Cuevas,et al.  What are we doing here? Egocentric activity recognition on the move for contextual mapping , 2012, 2012 IEEE International Conference on Robotics and Automation.

[17]  James M. Rehg,et al.  Modeling Actions through State Changes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Yong Jae Lee,et al.  Discovering important people and objects for egocentric video summarization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  James M. Rehg,et al.  Learning to recognize objects in egocentric activities , 2011, CVPR 2011.

[20]  Didier Stricker,et al.  Learning task structure from video examples for workflow tracking and authoring , 2012, 2012 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[21]  James M. Rehg,et al.  Learning to Predict Gaze in Egocentric Video , 2013, 2013 IEEE International Conference on Computer Vision.

[22]  Stijn De Beugher,et al.  Automatic analysis of eye-tracking data using object detection algorithms , 2012, UbiComp '12.