Multiple Kernel Learning and Optical Flow for Action Recognition in RGB-D Video

Recognizing human action is valuable for many real world applications such as video surveillance, human computer interaction, smart home and gaming. In this paper, we present a method of action recognition based on hypothesizing that the classification of action can be boosted by motion information using optical flow. Emergence of automatic RGBD video analysis, we propose fusing optical flow is extracted from both RGB and depth channels for action representation. Firstly, we extract optical flow from RGB and depth data. Secondly, motion descriptor with spatial pyramid is computed from histogram of optical flow of RGB and depth. Then, feature pooling technique is used in order to accumulate RGB and depth feature into set of feature vectors for each action. Finally, we use the Multiple Kernel Learning (MKL) technique at the kernel level for action classification from RGB and depth feature pooling. To demonstrate generalizability, our proposed method has been systematically evaluated on two benchmark datasets shown to be more effective and accurate for action recognition compared to the previous work. We obtain overall accuracies of: 97.5 % and 92.8 % with our proposed method on the 3D Action Pairs and MSR-Daily Activity 3D dataset, respectively.

[1]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[2]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Cheng Hong,et al.  RGB-Depth feature for 3D human activity recognition , 2013, China Communications.

[4]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[5]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Luis Salgado,et al.  Efficient spatio-temporal hole filling strategy for Kinect depth maps , 2012, Electronic Imaging.

[7]  Ling Shao,et al.  Learning Discriminative Representations from RGB-D Video Data , 2013, IJCAI.

[8]  Ying Wu,et al.  Robust 3D Action Recognition with Random Occupancy Patterns , 2012, ECCV.

[9]  Xinghua Sun,et al.  Action recognition via local descriptors and holistic features , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[10]  Xiaodong Yang,et al.  Super Normal Vector for Activity Recognition Using Depth Sequences , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[12]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[13]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[14]  Luc Van Gool,et al.  An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector , 2008, ECCV.

[15]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[16]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[17]  Jake K. Aggarwal,et al.  Human Motion Analysis: A Review , 1999, Comput. Vis. Image Underst..

[18]  Mario Fernando Montenegro Campos,et al.  STOP: Space-Time Occupancy Patterns for 3D Action Recognition from Depth Map Sequences , 2012, CIARP.

[19]  Xiaodong Yang,et al.  Recognizing actions using depth motion maps-based histograms of oriented gradients , 2012, ACM Multimedia.

[20]  Gunnar Farnebäck,et al.  Two-Frame Motion Estimation Based on Polynomial Expansion , 2003, SCIA.

[21]  Cordelia Schmid,et al.  Dense Trajectories and Motion Boundary Descriptors for Action Recognition , 2013, International Journal of Computer Vision.

[22]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[23]  Zicheng Liu,et al.  HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Qing Zhang,et al.  A Survey on Human Motion Analysis from Depth Data , 2013, Time-of-Flight and Depth Imaging.

[25]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.