Recognizing object affordances in terms of spatio-temporal object-object relationships

In this paper we describe a probabilistic framework that models the interaction between multiple objects in a scene. We present a spatio-temporal feature encoding pairwise interactions between each object in the scene. By the use of a kernel representation we embed object interactions in a vector space which allows us to define a metric comparing interactions of different temporal extent. Using this metric we define a probabilistic model which allows us to represent and extract the affordances of individual objects based on the structure of their interaction. In this paper we focus on the presented pairwise relationships but the model can naturally be extended to incorporate additional cues related to a single object or multiple objects. We compare our approach with traditional kernel approaches and show a significant improvement.

[1]  E. Reed The Ecological Approach to Visual Perception , 1989 .

[2]  Alexei A. Efros,et al.  From 3D scene geometry to human workspace , 2011, CVPR 2011.

[3]  Antonio Torralba,et al.  Contextual Priming for Object Detection , 2003, International Journal of Computer Vision.

[4]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[5]  Deva Ramanan,et al.  Detecting activities of daily living in first-person camera views , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[7]  Ashutosh Saxena,et al.  Robotic Grasping of Novel Objects using Vision , 2008, Int. J. Robotics Res..

[8]  Danica Kragic,et al.  From object categories to grasp transfer using probabilistic reasoning , 2012, 2012 IEEE International Conference on Robotics and Automation.

[9]  Luc Van Gool,et al.  What makes a chair a chair? , 2011, CVPR 2011.

[10]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[11]  Abhinav Gupta,et al.  Beyond active noun tagging: Modeling contextual interactions for multi-class active learning , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Manuel Lopes,et al.  Learning Object Affordances: From Sensory--Motor Coordination to Imitation , 2008, IEEE Transactions on Robotics.

[13]  Danica Kragic,et al.  Grasping familiar objects using shape context , 2009, 2009 International Conference on Advanced Robotics.

[14]  Georgios Ch. Sirakoulis,et al.  Non-probabilistic cellular automata-enhanced stereo vision simultaneous localization and mapping , 2011 .

[15]  Danica Kragic,et al.  Visual object-action recognition: Inferring object affordances from human demonstration , 2011, Comput. Vis. Image Underst..

[16]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[17]  A. Torralba,et al.  The role of context in object recognition , 2007, Trends in Cognitive Sciences.

[18]  Luc Van Gool,et al.  Functional categorization of objects using real-time markerless motion capture , 2011, CVPR 2011.

[19]  J. M. Hammersley,et al.  Markov fields on finite graphs and lattices , 1971 .

[20]  J.-Y. Bouguet,et al.  Pyramidal implementation of the lucas kanade feature tracker , 1999 .

[21]  Antonio Torralba,et al.  Context models and out-of-context objects , 2012, Pattern Recognition Letters.

[22]  Larry S. Davis,et al.  Observing Human-Object Interactions: Using Spatial and Functional Compatibility for Recognition , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  David G. Stork,et al.  Generic object recognition using form and function , 1998, Pattern Analysis and Applications.

[24]  M. Bar Visual objects in context , 2004, Nature Reviews Neuroscience.

[25]  Anthony Hoogs,et al.  Unsupervised Learning of Functional Categories in Video Scenes , 2010, ECCV.

[26]  D. Stork Generic object recognition using form & function , 1996 .

[27]  Guoliang Luo,et al.  Representing actions with kernels , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[28]  Luc De Raedt,et al.  Learning relational affordance models for robots in multi-object manipulation tasks , 2012, 2012 IEEE International Conference on Robotics and Automation.

[29]  Eren Erdal Aksoy,et al.  Categorizing object-action relations from semantic scene graphs , 2010, 2010 IEEE International Conference on Robotics and Automation.

[30]  Bernt Schiele,et al.  Functional Object Class Detection Based on Learned Affordance Cues , 2008, ICVS.

[31]  Irfan A. Essa,et al.  Exploiting human actions and object context for recognition tasks , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[32]  Danica Kragic,et al.  The Path Kernel , 2013, ICPRAM.

[33]  Danica Kragic,et al.  Improving generalization for 3D object categorization with Global Structure Histograms , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[34]  Roman Filipovych,et al.  Recognizing primitive interactions by exploring actor-object states , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Moshe Bar,et al.  Integrated Contextual Representation for Objects' Identities and Their Locations , 2008, Journal of Cognitive Neuroscience.

[36]  Manuela M. Veloso,et al.  Learning visual object definitions by observing human activities , 2005, 5th IEEE-RAS International Conference on Humanoid Robots, 2005..

[37]  Hedvig Kjellström,et al.  Functional object descriptors for human activity modeling , 2013, 2013 IEEE International Conference on Robotics and Automation.

[38]  Barbara Caputo,et al.  Confidence-based cue integration for visual place recognition , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[39]  Azriel Rosenfeld,et al.  Recognition by Functional Parts , 1995, Comput. Vis. Image Underst..

[40]  Stefan Schaal,et al.  Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.

[41]  James M. Rehg,et al.  A Scalable Approach to Activity Recognition based on Object Use , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[42]  Svetha Venkatesh,et al.  Combining image regions and human activity for indirect object recognition in indoor wide-angle views , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[43]  Martin J. Wainwright,et al.  Tree-reweighted belief propagation algorithms and approximate ML estimation by pseudo-moment matching , 2003, AISTATS.