Functional object descriptors for human activity modeling

The ability to learn from human demonstration is essential for robots in human environments. The activity models that the robot builds from observation must take both the human motion and the objects involved into account. Object models designed for this purpose should reflect the role of the object in the activity - its function, or affordances. The main contribution of this paper is to represent object directly in terms of their interaction with human hands, rather than in terms of appearance. This enables the direct representation of object affordances/function, while being robust to intra-class differences in appearance. Object hypotheses are first extracted from a video sequence as tracks of associated image segments. The object hypotheses are encoded as strings, where the vocabulary corresponds to different types of interaction with human hands. The similarity between two such object descriptors can be measured using a string kernel. Experiments show these functional descriptors to capture differences and similarities in object affordances/function that are not represented by appearance.

[1]  Manuel Lopes,et al.  Learning Object Affordances: From Sensory--Motor Coordination to Imitation , 2008, IEEE Transactions on Robotics.

[2]  David G. Stork,et al.  Generic object recognition using form and function , 1998, Pattern Analysis and Applications.

[3]  James M. Rehg,et al.  Guided pushing for object singulation , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[4]  Deva Ramanan,et al.  Detecting activities of daily living in first-person camera views , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Azriel Rosenfeld,et al.  Recognition by Functional Parts , 1995, Comput. Vis. Image Underst..

[6]  Stefan Schaal,et al.  Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.

[7]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[8]  Danica Kragic,et al.  From object categories to grasp transfer using probabilistic reasoning , 2012, 2012 IEEE International Conference on Robotics and Automation.

[9]  Luc Van Gool,et al.  What makes a chair a chair? , 2011, CVPR 2011.

[10]  James M. Rehg,et al.  A Scalable Approach to Activity Recognition based on Object Use , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[11]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[12]  Svetha Venkatesh,et al.  Combining image regions and human activity for indirect object recognition in indoor wide-angle views , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[13]  Irfan A. Essa,et al.  Exploiting human actions and object context for recognition tasks , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[14]  Stefan Schaal,et al.  Robot Programming by Demonstration , 2009, Springer Handbook of Robotics.

[15]  Louise Stark,et al.  Function-Based Reasoning for Goal-Oriented Image Segmentation , 2006, Towards Affordance-Based Robot Control.

[16]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[17]  Guoliang Luo,et al.  Representing actions with kernels , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[18]  Luc De Raedt,et al.  Learning relational affordance models for robots in multi-object manipulation tasks , 2012, 2012 IEEE International Conference on Robotics and Automation.

[19]  Luc Van Gool,et al.  Functional categorization of objects using real-time markerless motion capture , 2011, CVPR 2011.

[20]  Alexei A. Efros,et al.  Scene Semantics from Long-Term Observation of People , 2012, ECCV.

[21]  Giulio Sandini,et al.  Learning about objects through action - initial steps towards artificial cognition , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[22]  Danica Kragic,et al.  Visual object-action recognition: Inferring object affordances from human demonstration , 2011, Comput. Vis. Image Underst..

[23]  Jiri Matas,et al.  Robust Detection of Lines Using the Progressive Probabilistic Hough Transform , 2000, Comput. Vis. Image Underst..

[24]  Ashutosh Saxena,et al.  Robotic Grasping of Novel Objects using Vision , 2008, Int. J. Robotics Res..

[25]  Niklas Bergström,et al.  Scene Understanding through Autonomous Interactive Perception , 2011, ICVS.

[26]  Marc Toussaint,et al.  Planning with Noisy Probabilistic Relational Rules , 2010, J. Artif. Intell. Res..

[27]  Hedvig Kjellström,et al.  Contextual Action Recognition , 2011, Visual Analysis of Humans.

[28]  D. Stork Generic object recognition using form & function , 1996 .

[29]  Manuela M. Veloso,et al.  Learning visual object definitions by observing human activities , 2005, 5th IEEE-RAS International Conference on Humanoid Robots, 2005..

[30]  Anthony Hoogs,et al.  Unsupervised Learning of Functional Categories in Video Scenes , 2010, ECCV.

[31]  Eren Erdal Aksoy,et al.  Categorizing object-action relations from semantic scene graphs , 2010, 2010 IEEE International Conference on Robotics and Automation.

[32]  Bernt Schiele,et al.  Functional Object Class Detection Based on Learned Affordance Cues , 2008, ICVS.

[33]  E. Reed The Ecological Approach to Visual Perception , 1989 .

[34]  Alexei A. Efros,et al.  From 3D scene geometry to human workspace , 2011, CVPR 2011.

[35]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[36]  Georgios Ch. Sirakoulis,et al.  Non-probabilistic cellular automata-enhanced stereo vision simultaneous localization and mapping , 2011 .

[37]  Danica Kragic,et al.  Grasping familiar objects using shape context , 2009, 2009 International Conference on Advanced Robotics.

[38]  Roman Filipovych,et al.  Recognizing primitive interactions by exploring actor-object states , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  J.-Y. Bouguet,et al.  Pyramidal implementation of the lucas kanade feature tracker , 1999 .

[40]  Larry S. Davis,et al.  Observing Human-Object Interactions: Using Spatial and Functional Compatibility for Recognition , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.