Recognising Egocentric Activities from Gaze Regions with Multiple-Voting Bag of Words

We present a system that aims to recognize activities from an egocentric perspective where the prime source of information are gradient regions around the wearer’s gaze fixations. Inspired by evidence from Vision research on the analysis of gaze patterns of people doing manual tasks, we assess how well an existing real-time method for region description performs on a dataset of about 200 video sequences recorded from a wearable gaze tracker. We evaluate the use of the traditional bag of words classification approach, however we introduce and evaluate a weighted multiple voting scheme. We model an activity as a record of fixated visual landmarks as the person progresses through the steps. Our method has shown encouraging results on 11 different classes of manual and household activities, with our multiple voting scheme increasing the hit rate by nearly twofold.

[1]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[2]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Allan D. Jepson,et al.  Computational Perception of Scene Dynamics , 1996, ECCV.

[4]  M. Hayhoe,et al.  In what ways do eye movements contribute to everyday activities? , 2001, Vision Research.

[5]  Fei-Fei Li,et al.  What, where and who? Classifying events by scene and object recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[6]  Larry S. Davis,et al.  Observing Human-Object Interactions: Using Spatial and Functional Compatibility for Recognition , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[8]  Vincent Lepetit,et al.  Dominant orientation templates for real-time detection of texture-less objects , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Esma Aïmeur,et al.  Activity recognition using eye-gaze movements and traditional interactions , 2011, Interact. Comput..

[10]  Christopher Joseph Pal,et al.  Activity recognition using the velocity histories of tracked keypoints , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[11]  Jake K. Aggarwal,et al.  Hierarchical Recognition of Human Activities Interacting with Objects , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Deva Ramanan,et al.  Detecting activities of daily living in first-person camera views , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[14]  Shih-Fu Chang,et al.  Using human observer eye movements in automatic image classifiers , 2001, IS&T/SPIE Electronic Imaging.

[15]  Ali Farhadi,et al.  Understanding egocentric activities , 2011, 2011 International Conference on Computer Vision.

[16]  Mahdi Nezamabadi,et al.  Color Appearance Models , 2014, J. Electronic Imaging.

[17]  Irfan A. Essa,et al.  Exploiting human actions and object context for recognition tasks , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[18]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.

[19]  Walterio W. Mayol-Cuevas,et al.  What are we doing here? Egocentric activity recognition on the move for contextual mapping , 2012, 2012 IEEE International Conference on Robotics and Automation.

[20]  James M. Rehg,et al.  A Scalable Approach to Activity Recognition based on Object Use , 2007, 2007 IEEE 11th International Conference on Computer Vision.