Interactive body part contrast mining for human interaction recognition

The recognition of multi-person interactions still remains a challenge because of the mutual occlusion and redundant poses. We propose an interactive body part contrast mining method based on joints for human interaction recognition. To efficiently describe interactions, we propose an interactive body part model which connects the interactive limbs of different participants to represent the relationship of interactive body parts. Then we calculate the spatial-temporal joint features for 8 interactive limb pairs in a short frame set for motion description (poselets). Employing contrast mining, we determine the essential interactive pairs and poselets for each interaction class to delete the redundant action information, and use these poselets to generate a poselet dictionary for interaction representation following bag-of-words. SVM with RBF kernel is adopted for recognition. We evaluate the proposed algorithm on two databases, the SBU interaction database and a newly collected RGBD-skeleton interaction database. Experiment results indicate the effectiveness of the proposed algorithm. The recognition accuracy reaches 85.4% on our interaction database, and 86.8% on SBU interaction database, 6% higher than the method in [1].

[1]  Yi Yang,et al.  Recognizing proxemics in personal photos , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Youtian Du,et al.  Human Interaction Representation and Recognition Through Motion Decomposition , 2007, IEEE Signal Processing Letters.

[3]  Jake K. Aggarwal,et al.  Recognition of Composite Human Activities through Context-Free Grammar Based Representation , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[4]  Luc Van Gool,et al.  Does Human Action Recognition Benefit from Pose Estimation? , 2011, BMVC.

[5]  Atsushi Shimada,et al.  Contribution estimation of participants for human interaction recognition , 2013 .

[6]  Jake K. Aggarwal,et al.  An Overview of Contest on Semantic Description of Human Activities (SDHA) 2010 , 2010, ICPR Contests.

[7]  Yunde Jia,et al.  Learning Human Interaction by Interactive Phrases , 2012, ECCV.

[8]  Jake K. Aggarwal,et al.  A hierarchical framework for understanding human-human interactions in video surveillance , 2005, IS&T/SPIE Electronic Imaging.

[9]  Dimitris Samaras,et al.  Two-person interaction detection using body-pose features and multiple instance learning , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[10]  Luc Van Gool,et al.  Variations of a Hough-Voting Action Recognition System , 2010, ICPR Contests.

[11]  Alan L. Yuille,et al.  An Approach to Pose-Based Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Wei Guo,et al.  Efficient Interaction Recognition through Positive Action Representation , 2013 .

[13]  Jake K. Aggarwal,et al.  Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities , 2009, 2009 IEEE 12th International Conference on Computer Vision.