You-Do, I-Learn: Unsupervised Multi-User egocentric Approach Towards Video-Based Guidance

This paper presents an unsupervised approach towards automatically extracting video-based guidance on object usage, from egocentric video and wearable gaze tracking, collected from multiple users while performing tasks. The approach i) discovers task relevant objects, ii) builds a model for each, iii) distinguishes different ways in which each discovered object has been used and iv) discovers the dependencies between object interactions. The work investigates using appearance, position, motion and attention, and presents results using each and a combination of relevant features. Moreover, an online scalable approach is presented and is compared to offline results. The paper proposes a method for selecting a suitable video guide to be displayed to a novice user indicating how to use an object, purely triggered by the user's gaze. The potential assistive mode can also recommend an object to be used next based on the learnt sequence of object interactions. The approach was tested on a variety of daily tasks such as initialising a printer, preparing a coffee and setting up a gym machine.

[1]  Stijn De Beugher,et al.  Automatic analysis of eye-tracking data using object detection algorithms , 2012, UbiComp '12.

[2]  Dimitris N. Metaxas,et al.  D - Clutter: Building object model library from unsupervised segmentation of cluttered scenes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Dima Damen,et al.  Multi-User Egocentric Online System for Unsupervised Assistance on Object Usage , 2014, ECCV Workshops.

[4]  Rahul Sukthankar,et al.  A theory of the quasi-static world , 2002, Object recognition supported by user interaction for service robots.

[5]  Hideo Saito,et al.  Task support system by displaying instructional video onto AR workspace , 2010, 2010 IEEE International Symposium on Mixed and Augmented Reality.

[6]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[7]  Michael Beetz,et al.  EYEWATCHME—3D Hand and object tracking for inside out activity analysis , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[8]  Walterio W. Mayol-Cuevas,et al.  What are we doing here? Egocentric activity recognition on the move for contextual mapping , 2012, 2012 IEEE International Conference on Robotics and Automation.

[9]  James M. Rehg,et al.  Learning to Predict Gaze in Egocentric Video , 2013, 2013 IEEE International Conference on Computer Vision.

[10]  Siddhartha S. Srinivasa,et al.  Exploiting domain knowledge for Object Discovery , 2013, 2013 IEEE International Conference on Robotics and Automation.

[11]  Didier Stricker,et al.  Learning task structure from video examples for workflow tracking and authoring , 2012, 2012 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[12]  Alexei A. Efros,et al.  Using Multiple Segmentations to Discover Objects and their Extent in Image Collections , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[13]  James M. Rehg,et al.  Learning to recognize objects in egocentric activities , 2011, CVPR 2011.

[14]  Takeo Kanade,et al.  Data-Driven Objectness , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Takeo Kanade,et al.  Discovering object instances from scenes of Daily Living , 2011, 2011 International Conference on Computer Vision.

[16]  M. Land,et al.  The Roles of Vision and Eye Movements in the Control of Activities of Daily Living , 1998, Perception.

[17]  Nicu Sebe,et al.  Recognizing Daily Activities from First-Person Videos with Multi-task Clustering , 2014, ACCV.

[18]  Yoichi Sato,et al.  Coupling eye-motion and ego-motion features for first-person activity recognition , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[19]  Christoph H. Lampert,et al.  Unsupervised Object Discovery: A Comparison , 2010, International Journal of Computer Vision.

[20]  James M. Rehg,et al.  A Scalable Approach to Activity Recognition based on Object Use , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[21]  Joseph H. Goldberg,et al.  Identifying fixations and saccades in eye-tracking protocols , 2000, ETRA.

[22]  Kristen Grauman,et al.  Story-Driven Summarization for Egocentric Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  M. Land Eye movements and the control of actions in everyday life , 2006, Progress in Retinal and Eye Research.

[24]  G. Klein,et al.  Parallel Tracking and Mapping for Small AR Workspaces , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[25]  Christos Faloutsos,et al.  Unsupervised modeling of object categories using link analysis techniques , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Dima Damen,et al.  Real-time Learning and Detection of 3D Texture-less Objects: A Scalable Approach , 2012, BMVC 2012.

[27]  R. Johansson,et al.  Eye–Hand Coordination in Object Manipulation , 2001, The Journal of Neuroscience.

[28]  Deva Ramanan,et al.  Detecting activities of daily living in first-person camera views , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Fei-Fei Li,et al.  Object discovery in 3D scenes via shape analysis , 2013, 2013 IEEE International Conference on Robotics and Automation.

[30]  Dieter Fox,et al.  Toward object discovery and modeling via 3-D scene comparison , 2011, 2011 IEEE International Conference on Robotics and Automation.

[31]  Takahiro Okabe,et al.  Fast unsupervised ego-action learning for first-person sports videos , 2011, CVPR 2011.

[32]  James M. Rehg,et al.  Modeling Actions through State Changes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Xiaofeng Ren,et al.  Figure-ground segmentation improves handled object recognition in egocentric video , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[34]  Martial Hebert,et al.  Preliminary Development of a Line Feature-Based Object Recognition System for Textureless Indoor Objects , 2007 .

[35]  Bhaskara Marthi,et al.  Object disappearance for object discovery , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[36]  James M. Rehg,et al.  Learning to Recognize Daily Actions Using Gaze , 2012, ECCV.

[37]  Yong Jae Lee,et al.  Discovering important people and objects for egocentric video summarization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Dima Damen,et al.  Integrating 3D object detection, modelling and tracking on a mobile phone , 2012, 2012 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[39]  Katsushi Ikeuchi,et al.  Determination of motion breakpoints in a task sequence from human hand motion , 1994, Proceedings of the 1994 IEEE International Conference on Robotics and Automation.

[40]  Walterio W. Mayol-Cuevas,et al.  High level activity recognition using low resolution wearable vision , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[41]  Jenny Benois-Pineau,et al.  Saliency-based object recognition in video , 2013 .

[42]  Geert Br,et al.  Automatic analysis of eye-tracking data using object detection algorithms , 2012 .

[43]  Chandra Kambhamettu,et al.  D - Clutter: Building object model library from unsupervised segmentation of cluttered scenes , 2009, CVPR.

[44]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[45]  Walterio W. Mayol-Cuevas,et al.  3D from looking: using wearable gaze tracking for hands-free and feedback-free object modelling , 2013, ISWC '13.

[46]  Tsukasa Ogasawara,et al.  Estimating 3D point-of-regard and visualizing gaze trajectories under natural head movements , 2010, ETRA '10.

[47]  Matthai Philipose,et al.  Egocentric recognition of handled objects: Benchmark and analysis , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[48]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[50]  Marc Bolaños Solà Ego-Object Discovery in Lifelogging Datasets , 2015 .

[51]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[52]  Dima Damen,et al.  You-Do, I-Learn: Discovering Task Relevant Objects and their Modes of Interaction from Multi-User Egocentric Video , 2014, BMVC.