A compositional approach for 3D arm-hand action recognition

In this paper we propose a fast and reliable vision-based framework for 3D arm-hand action modelling, learning and recognition in human-robot interaction scenarios. The architecture consists of a compositional model that divides the arm-hand action recognition problem into three levels. The bottom level is based on a simple but sufficiently accurate algorithm for the computation of the scene flow. The middle level serves to classify action primitives through descriptors obtained from 3D Histogram of Flow (3D-HOF); we further apply a sparse coding (SC) algorithm to deal with noise. Action Primitives are then modelled and classified by linear Support Vector Machines (SVMs), and we propose an on-line algorithm to cope with the real-time recognition of primitive sequences. The top level system synthesises combinations of primitives by means of a syntactic approach. In summary the main contribution of the paper is an incremental method for 3D arm-hand behaviour modelling and recognition, fully implemented and tested on the iCub robot, allowing it to learn new actions after a single demonstration.

[1]  Daniel Cremers,et al.  Stereoscopic Scene Flow Computation for 3D Motion Understanding , 2011, International Journal of Computer Vision.

[2]  Toby Sharp,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR.

[3]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[4]  Frederic Devernay,et al.  A Variational Method for Scene Flow Estimation from Stereo Sequences , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[5]  Junji Yamato,et al.  Recognizing human action in time-sequential images using hidden Markov model , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[7]  Michael Elad,et al.  Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries , 2006, IEEE Transactions on Image Processing.

[8]  A. Kolmogorov Three approaches to the quantitative definition of information , 1968 .

[9]  Giulio Sandini,et al.  The iCub humanoid robot: an open platform for research in embodied cognition , 2008, PerMIS.

[10]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Thomas Brox,et al.  Universität Des Saarlandes Fachrichtung 6.1 – Mathematik Highly Accurate Optic Flow Computation with Theoretically Justified Warping Highly Accurate Optic Flow Computation with Theoretically Justified Warping , 2022 .

[12]  C. Breazeal,et al.  Robots that imitate humans , 2002, Trends in Cognitive Sciences.

[13]  Ramakant Nevatia,et al.  Single View Human Action Recognition using Key Pose Matching and Viterbi Path Searching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Ramakant Nevatia,et al.  Coupled Hidden Semi Markov Models for Activity Recognition , 2007, 2007 IEEE Workshop on Motion and Video Computing (WMVC'07).

[15]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[16]  François Denis,et al.  Learning Regular Languages from Simple Positive Examples , 2001, Machine Learning.

[17]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[18]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[19]  Gunnar Farnebäck,et al.  Two-Frame Motion Estimation Based on Polynomial Expansion , 2003, SCIA.

[20]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[21]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[22]  W. Eric L. Grimson,et al.  Adaptive background mixture models for real-time tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[23]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[24]  Heiko Hirschmüller,et al.  Stereo Processing by Semiglobal Matching and Mutual Information , 2008, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[26]  Christopher W. Geib,et al.  The meaning of action: a review on action recognition and mapping , 2007, Adv. Robotics.

[27]  Le Li,et al.  SENSC: a Stable and Efficient Algorithm for Nonnegative Sparse Coding: SENSC: a Stable and Efficient Algorithm for Nonnegative Sparse Coding , 2009 .

[28]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  Ilaria Gori,et al.  Arm-Hand Behaviours Modelling: From Attention to Imitation , 2010, ISVC.

[30]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .