Categorizing object-action relations from semantic scene graphs

In this work we introduce a novel approach for detecting spatiotemporal object-action relations, leading to both, action recognition and object categorization. Semantic scene graphs are extracted from image sequences and used to find the characteristic main graphs of the action sequence via an exact graph-matching technique, thus providing an event table of the action scene, which allows extracting object-action relations. The method is applied to several artificial and real action scenes containing limited context. The central novelty of this approach is that it is model free and needs a priori representation neither for objects nor actions. Essentially actions are recognized without requiring prior object knowledge and objects are categorized solely based on their exhibited role within an action sequence. Thus, this approach is grounded in the affordance principle, which has recently attracted much attention in robotics and provides a way forward for trial and error learning of object-action relations through repeated experimentation. It may therefore be useful for recognition and categorization tasks for example in imitation learning in developmental and cognitive robotics.

[1]  R. B. Potts Some generalized order-disorder transformations , 1952, Mathematical Proceedings of the Cambridge Philosophical Society.

[2]  J. Gibson The Ecological Approach to Visual Perception , 1979 .

[3]  Wang,et al.  Nonuniversal critical dynamics in Monte Carlo simulations. , 1987, Physical review letters.

[4]  Eytan Domany,et al.  Superparamagnetic Clustering of Data , 1996 .

[5]  Florentin Wörgötter,et al.  A Fast And Robust Cluster Update Algorithm For Image Segmentation In Spin-Lattice Models Without Annealing - Visual Latencies Revisited , 1998, Neural Comput..

[6]  von Ferber C,et al.  Cluster update algorithm and recognition , 2000, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[7]  Wen-Jing Li,et al.  Object recognition by sub-scene graph matching , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[8]  Francis Schmitt,et al.  Silhouette and stereo fusion for 3D object modeling , 2003, Fourth International Conference on 3-D Digital Imaging and Modeling, 2003. 3DIM 2003. Proceedings..

[9]  Francis Schmitt,et al.  Silhouette and stereo fusion for 3D object modeling , 2003, Fourth International Conference on 3-D Digital Imaging and Modeling, 2003. 3DIM 2003. Proceedings..

[10]  Somboon Hongeng Unsupervised Learning of Multi-Object Event Classes ∗ , 2004 .

[11]  Cordelia Schmid,et al.  3D Object Modeling and Recognition Using Local Affine-Invariant Image Descriptors and Multi-View Spatial Constraints , 2006, International Journal of Computer Vision.

[12]  Henry A. Kautz,et al.  Location-Based Activity Recognition using Relational Markov Networks , 2005, IJCAI.

[13]  Christopher W. Geib,et al.  Object Action Complexes as an Interface for Planning and Robot Control , 2006 .

[14]  Masahiro Tomono 3D Object Modeling and Segmentation Based on Edge-Point Matching with Local Descriptors , 2008, ISVC.

[15]  Miquel Ferrer Sumsi Theory and Algorithms on the Median Graph: application to Graph-Based Classification and Clustering , 2008 .

[16]  Henry A. Kautz,et al.  Improving the recognition of interleaved activities , 2008, UbiComp.

[17]  Anthony G. Cohn,et al.  Learning Functional Object-Categories from a Relational Spatio-Temporal Representation , 2008, ECAI.

[18]  Babette Dellen,et al.  Ascertaining relevant changes in visual data by interfacing AI reasoning and low-level visual information via temporally stable image segments data , 2008 .

[19]  Babette Dellen,et al.  Disparity from Stereo-segment Silhouettes of Weakly-textured Images , 2009, BMVC.

[20]  Florentin Wörgötter,et al.  Cognitive agents - a procedural perspective relying on the predictability of Object-Action-Complexes (OACs) , 2009, Robotics Auton. Syst..

[21]  R. Grupen,et al.  Intrinsically Motivated Affordance Learning , 2009 .

[22]  Eren Erdal Aksoy,et al.  Segment Tracking via a Spatiotemporal Linking Process including Feedback Stabilization in an n-D Lattice Model , 2009, Sensors.

[23]  Carme Torras,et al.  3D Object Reconstruction from Swissranger Sensor Data using a Spring-mass Model , 2009, VISAPP.