First and second order dynamics in a hierarchical SOM system for action recognition

Human recognition of the actions of other humans is very efficient and is based on patterns of movements. Our theoretical starting point is that the dynamics of the joint movements is important to action categorization. On the basis of this theory, we present a novel action recognition system that employs a hierarchy of Self-Organizing Maps together with a custom supervised neural network that learns to categorize actions. The system preprocesses the input from a Kinect like 3D camera to exploit the information not only about joint positions, but also their first and second order dynamics. We evaluate our system in two experiments with publicly available datasets, and compare its performance to the performance with less sophisticated preprocessing of the input. The results show that including the dynamics of the actions improves the performance. We also apply an attention mechanism that focuses on the parts of the body that are the most involved in performing the actions.

[1]  Silvia P. Gennari,et al.  Human locomotion in languages: Constraints on moving and meaning , 2014 .

[2]  Peter Gärdenfors,et al.  Construals of meaning: The role of attention in robotic language production , 2016 .

[3]  Jake K. Aggarwal,et al.  Spatio-temporal Depth Cuboid Similarity Feature for Activity Recognition Using Depth Camera , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  M. Lappe,et al.  Measurement of generalization fields for the recognition of biological motion , 2002, Vision Research.

[5]  E. Rosch Cognitive Representations of Semantic Categories. , 1975 .

[6]  Majid Nili Ahmadabadi,et al.  Attention control learning in the decision space using state estimation , 2016, Int. J. Syst. Sci..

[7]  Wei Huang,et al.  Human action recognition based on Self Organizing Map , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Jake K. Aggarwal,et al.  View invariant human action recognition using histograms of 3D joints , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[9]  P. Gärdenfors The Geometry of Meaning: Semantics Based on Conceptual Spaces , 2014 .

[10]  Peter Gärdenfors,et al.  Using Conceptual Spaces to Model Actions and Events , 2012, J. Semant..

[11]  Shimon Edelman,et al.  Metrics of the perception of body movement. , 2008, Journal of vision.

[12]  Sinan Kalkan,et al.  Verb concepts from affordances , 2014 .

[13]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .

[14]  Magnus Johnsson,et al.  Hierarchies of Self-Organizing Maps for action recognition , 2016, Cognitive Systems Research.

[15]  K. Fujii,et al.  Visualization for the analysis of fluid motion , 2005, J. Vis..

[16]  Peter Gärdenfors,et al.  Event structure, conceptual spaces and the semantics of verbs , 2012 .

[17]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  L. Vaina From shapes and movements to objects and actions , 2004, Synthese.

[19]  Beth Levin,et al.  Argument Realization , 2005 .

[20]  Yiannis Demiris,et al.  Hierarchical attentive multiple models for execution and recognition of actions , 2006, Robotics Auton. Syst..

[21]  D. Marr,et al.  Representation and recognition of the spatial organization of three-dimensional shapes , 1978, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[22]  Peter Gärdenfors,et al.  Conceptual spaces - the geometry of thought , 2000 .

[23]  G. Johansson Visual perception of biological motion and a model for its analysis , 1973 .

[24]  Zicheng Liu,et al.  HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Peter Gärdenfors,et al.  Hierarchical Self-organizing Maps System for Action Classification , 2017, ICAART.

[26]  Mario Fernando Montenegro Campos,et al.  STOP: Space-Time Occupancy Patterns for 3D Action Recognition from Depth Map Sequences , 2012, CIARP.

[27]  Peter Gärdenfors,et al.  Representing actions and functional properties in conceptual spaces , 2007 .

[28]  Giulio Sandini,et al.  The ITALK project: Integration and Transfer of Action and Language Knowledge in Robots , 2008 .

[29]  Christian Balkenius,et al.  Ikaros: Building cognitive models for robots , 2010, Adv. Eng. Informatics.

[30]  Peter Gärdenfors,et al.  Action Recognition Online with Hierarchical Self-Organizing Maps , 2016, 2016 12th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS).

[31]  Peter Gärdenfors,et al.  Online Recognition of Actions Involving Objects , 2017, BICA 2017.

[32]  L Vaina,et al.  A Computational Approach to Visual Recognition of Arm Movements , 1985, Perceptual and motor skills.

[33]  Ying Wu,et al.  Robust 3D Action Recognition with Random Occupancy Patterns , 2012, ECCV.

[34]  W. Sellers,et al.  Comparison of inverse-dynamics musculo-skeletal models of AL 288-1 Australopithecus afarensis and KNM-WT 15000 Homo ergaster to modern humans, with implications for the evolution of bipedalism. , 2004, Journal of human evolution.

[35]  D. Marr,et al.  Representation and recognition of the movements of shapes , 1982, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[36]  Peter Ford Dominey,et al.  Linking Language with Embodied and Teleological Representations of Action for Humanoid Cognition , 2010, Front. Neurorobot..

[37]  Magnus Johnsson,et al.  Simulating Actions with the Associative Self-Organizing Map , 2013, AIC@AI*IA.

[38]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[39]  Majid Nili Ahmadabadi,et al.  Biologically Inspired Framework for Learning and Abstract Representation of Attention Control , 2008, WAPCV.

[40]  S. Runeson,et al.  Kinematic specification of dynamics as an informational basis for person and action perception: Expe , 1983 .