Category-Blind Human Action Recognition: A Practical Recognition System

Existing human action recognition systems for 3D sequences obtained from the depth camera are designed to cope with only one action category, either single-person action or two-person interaction, and are difficult to be extended to scenarios where both action categories co-exist. In this paper, we propose the category-blind human recognition method (CHARM) which can recognize a human action without making assumptions of the action category. In our CHARM approach, we represent a human action (either a single-person action or a two-person interaction) class using a co-occurrence of motion primitives. Subsequently, we classify an action instance based on matching its motion primitive co-occurrence patterns to each class representation. The matching task is formulated as maximum clique problems. We conduct extensive evaluations of CHARM using three datasets for single-person actions, two-person interactions, and their mixtures. Experimental results show that CHARM performs favorably when compared with several state-of-the-art single-person action and two-person interaction based methods without making explicit assumptions of action category.

[1]  Mubarak Shah,et al.  Video Classification Using Semantic Concept Co-occurrences , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Quan Z. Sheng,et al.  Online human gesture recognition from motion data streams , 2013, ACM Multimedia.

[3]  Xiaogang Wang,et al.  Understanding collective crowd behaviors: Learning a Mixture model of Dynamic pedestrian-Agents , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Gang Yu,et al.  Discriminative Orderlet Mining for Real-Time Recognition of Human-Object Interaction , 2014, ACCV.

[5]  Christian Bauckhage,et al.  Efficient Pose-Based Action Recognition , 2014, ACCV.

[6]  Antonio Fernández-Caballero,et al.  A survey of video datasets for human action and activity recognition , 2013, Comput. Vis. Image Underst..

[7]  Silvio Savarese,et al.  Recognizing human actions by attributes , 2011, CVPR 2011.

[8]  Yaser Mowafi,et al.  Anatomical-plane-based representation for human-human interactions analysis , 2015, Pattern Recognit..

[9]  Gang Yu,et al.  Propagative Hough Voting for Human Activity Recognition , 2012, ECCV.

[10]  Jake K. Aggarwal,et al.  Human activity recognition from 3D data: A review , 2014, Pattern Recognit. Lett..

[11]  Ernst Althaus,et al.  A combinatorial approach to protein docking with flexible side-chains , 2000, RECOMB '00.

[12]  Afshin Dehghan,et al.  GMCP-Tracker: Global Multi-object Tracking Using Generalized Minimum Clique Graphs , 2012, ECCV.

[13]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[14]  Gang Yu,et al.  Unsupervised random forest indexing for fast action search , 2011, CVPR 2011.

[15]  Hee-Jung Yoon,et al.  Kintense: A robust, accurate, real-time and evolving system for detecting aggressive actions from streaming 3D skeleton data , 2014, PerCom.

[16]  Rama Chellappa,et al.  Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Dimitris Samaras,et al.  Two-person interaction detection using body-pose features and multiple instance learning , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[18]  Helena M. Mentis,et al.  Instructing people for training gestural interactive systems , 2012, CHI.

[19]  Yunde Jia,et al.  Learning Human Interaction by Interactive Phrases , 2012, ECCV.

[20]  Ling Shao,et al.  Leveraging Hierarchical Parametric Networks for Skeletal Joints Based Action Segmentation and Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[22]  Amr Sharaf,et al.  Real-Time Multi-scale Action Detection from 3D Skeleton Data , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[23]  Yunde Jia,et al.  Interactive Phrases: Semantic Descriptionsfor Human Interaction Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[25]  Kristen Grauman,et al.  Efficient activity detection with max-subgraph search , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.