You-Do, I-Learn: Egocentric unsupervised discovery of objects and their modes of interaction towards video-based guidance

Discovering task-relevant objects from egocentric video sequences of multiple users, using appearance, position, motion and attention features.Distinguishing different ways in which a task-relevant object has been used.Automatically extracting usage snippets, to be used for video-based guidance.Tested on a variety of daily tasks such as initialising a printer, preparing a coffee and setting up a gym machine. Display Omitted This paper presents an unsupervised approach towards automatically extracting video-based guidance on object usage, from egocentric video and wearable gaze tracking, collected from multiple users while performing tasks. The approach (i)źdiscovers task relevant objects, (ii) builds a model for each, (iii)źdistinguishes different ways in which each discovered object has been used and (iv)źdiscovers the dependencies between object interactions. The work investigates using appearance, position, motion and attention, and presents results using each and a combination of relevant features. Moreover, an online scalable approach is presented and is compared to offline results. The paper proposes a method for selecting a suitable video guide to be displayed to a novice user indicating how to use an object, purely triggered by the user's gaze. The potential assistive mode can also recommend an object to be used next based on the learnt sequence of object interactions. The approach was tested on a variety of daily tasks such as initialising a printer, preparing a coffee and setting up a gym machine.

[1]  Deva Ramanan,et al.  Detecting activities of daily living in first-person camera views , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Dima Damen,et al.  You-Do, I-Learn: Discovering Task Relevant Objects and their Modes of Interaction from Multi-User Egocentric Video , 2014, BMVC.

[3]  Alexei A. Efros,et al.  Using Multiple Segmentations to Discover Objects and their Extent in Image Collections , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[4]  Yong Jae Lee,et al.  Discovering important people and objects for egocentric video summarization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Takahiro Okabe,et al.  Fast unsupervised ego-action learning for first-person sports videos , 2011, CVPR 2011.

[6]  Walterio W. Mayol-Cuevas,et al.  What are we doing here? Egocentric activity recognition on the move for contextual mapping , 2012, 2012 IEEE International Conference on Robotics and Automation.

[7]  Dima Damen,et al.  Integrating 3D object detection, modelling and tracking on a mobile phone , 2012, 2012 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[8]  M. Land,et al.  The Roles of Vision and Eye Movements in the Control of Activities of Daily Living , 1998, Perception.

[9]  Nicu Sebe,et al.  Recognizing Daily Activities from First-Person Videos with Multi-task Clustering , 2014, ACCV.

[10]  Takeo Kanade,et al.  Discovering object instances from scenes of Daily Living , 2011, 2011 International Conference on Computer Vision.

[11]  Joseph H. Goldberg,et al.  Identifying fixations and saccades in eye-tracking protocols , 2000, ETRA.

[12]  Christos Faloutsos,et al.  Unsupervised modeling of object categories using link analysis techniques , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Sei Naito,et al.  An Attention-Based Activity Recognition for Egocentric Video , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[14]  Michael Beetz,et al.  EYEWATCHME—3D Hand and object tracking for inside out activity analysis , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[15]  Dieter Fox,et al.  Toward object discovery and modeling via 3-D scene comparison , 2011, 2011 IEEE International Conference on Robotics and Automation.

[16]  James M. Rehg,et al.  Modeling Actions through State Changes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Christoph H. Lampert,et al.  Unsupervised Object Discovery: A Comparison , 2010, International Journal of Computer Vision.

[18]  Stijn De Beugher,et al.  Automatic analysis of eye-tracking data using object detection algorithms , 2012, UbiComp '12.

[19]  Xiaofeng Ren,et al.  Figure-ground segmentation improves handled object recognition in egocentric video , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  Jean-Christophe Nebel,et al.  Recognition of Activities of Daily Living with Egocentric Vision: A Review , 2016, Sensors.

[21]  Andreas Krause,et al.  Advances in Neural Information Processing Systems (NIPS) , 2014 .

[22]  Dima Damen,et al.  Multi-User Egocentric Online System for Unsupervised Assistance on Object Usage , 2014, ECCV Workshops.

[23]  Rahul Sukthankar,et al.  A theory of the quasi-static world , 2002, Object recognition supported by user interaction for service robots.

[24]  David W. Murray,et al.  Applying Active Vision and SLAM to Wearables , 2005, ISRR.

[25]  James M. Rehg,et al.  Learning to Predict Gaze in Egocentric Video , 2013, 2013 IEEE International Conference on Computer Vision.

[26]  Dimitris N. Metaxas,et al.  D - Clutter: Building object model library from unsupervised segmentation of cluttered scenes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Dima Damen,et al.  Estimating visual attention from a head mounted IMU , 2015, SEMWEB.

[28]  Hideo Saito,et al.  Task support system by displaying instructional video onto AR workspace , 2010, 2010 IEEE International Symposium on Mixed and Augmented Reality.

[29]  Dima Damen,et al.  Recognizing linked events: Searching the space of feasible explanations , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  James M. Rehg,et al.  Learning to Recognize Daily Actions Using Gaze , 2012, ECCV.

[31]  Fei-Fei Li,et al.  Object discovery in 3D scenes via shape analysis , 2013, 2013 IEEE International Conference on Robotics and Automation.

[32]  Takeo Kanade,et al.  Data-Driven Objectness , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Kristen Grauman,et al.  Story-Driven Summarization for Egocentric Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Bhaskara Marthi,et al.  Object disappearance for object discovery , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[35]  G. Klein,et al.  Parallel Tracking and Mapping for Small AR Workspaces , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[36]  Matthias Rauterberg,et al.  The Evolution of First Person Vision Methods: A Survey , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[37]  Yoichi Sato,et al.  Coupling eye-motion and ego-motion features for first-person activity recognition , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[38]  Walterio W. Mayol-Cuevas,et al.  High level activity recognition using low resolution wearable vision , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[39]  M. Land Eye movements and the control of actions in everyday life , 2006, Progress in Retinal and Eye Research.

[40]  Jiri Matas,et al.  Efficient Texture-less Object Detection for Augmented Reality Guidance , 2015, 2015 IEEE International Symposium on Mixed and Augmented Reality Workshops.

[41]  Walterio W. Mayol-Cuevas,et al.  3D from looking: using wearable gaze tracking for hands-free and feedback-free object modelling , 2013, ISWC '13.

[42]  Jenny Benois-Pineau,et al.  Saliency-based object recognition in video , 2013 .

[43]  Tsukasa Ogasawara,et al.  Estimating 3D point-of-regard and visualizing gaze trajectories under natural head movements , 2010, ETRA '10.

[44]  Matthai Philipose,et al.  Egocentric recognition of handled objects: Benchmark and analysis , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[45]  Siddhartha S. Srinivasa,et al.  Exploiting domain knowledge for Object Discovery , 2013, 2013 IEEE International Conference on Robotics and Automation.

[46]  Didier Stricker,et al.  Learning task structure from video examples for workflow tracking and authoring , 2012, 2012 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[47]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[49]  Marc Bolaños Solà Ego-Object Discovery in Lifelogging Datasets , 2015 .

[50]  R. Johansson,et al.  Eye–Hand Coordination in Object Manipulation , 2001, The Journal of Neuroscience.

[51]  Dima Damen,et al.  Real-time Learning and Detection of 3D Texture-less Objects: A Scalable Approach , 2012, BMVC 2012.

[52]  Katsushi Ikeuchi,et al.  Determination of motion breakpoints in a task sequence from human hand motion , 1994, Proceedings of the 1994 IEEE International Conference on Robotics and Automation.

[53]  James M. Rehg,et al.  Learning to recognize objects in egocentric activities , 2011, CVPR 2011.

[54]  David W. Murray,et al.  Wearable hand activity recognition for event summarization , 2005, Ninth IEEE International Symposium on Wearable Computers (ISWC'05).

[55]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[56]  David W. Murray,et al.  On the Choice and Placement of Wearable Vision Sensors , 2009, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[57]  Chandra Kambhamettu,et al.  D - Clutter: Building object model library from unsupervised segmentation of cluttered scenes , 2009, CVPR.

[58]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.