Combining appearance and structural features for human action recognition

In this paper, we propose to integrate structural information with appearance features for human action recognition. In local representations based on detected spatio-temporal interest points (STIPs), the layout of STIPs carries important cues of motion structures in video sequences, and is assumed to contain complementary information to appearance features. We aim to incorporate structures into the description of STIPs by combing with appearance features for action representation. Based on the previous work of the 3D shape context, we present an optimised version of 3D shape context to encode the layout information of STIPs. By combining the proposed optimised 3D shape context with appearance descriptors, e.g., HOG3D and 3D gradients, we provide a more informative and discriminative description of STIPs for action classification. To validate the proposed descriptor, we have conducted extensive experiments on the KTH and the UCF YouTube datasets. The results prove that the optimised 3D shape context offers complementary information to appearance features, showing its effectiveness for action representation; moreover, the proposed descriptor yields comparable results with the state-of-the-art methods.

[1]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[2]  Ling Shao,et al.  Feature detector and descriptor evaluation in human action recognition , 2010, CIVR '10.

[3]  Xiangjian He,et al.  Action Recognition by Multiple Features and Hyper-Sphere Multi-class SVM , 2010, 2010 20th International Conference on Pattern Recognition.

[4]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[5]  Ling Shao,et al.  Spatio-temporal shape contexts for human action retrieval , 2009, IMCE '09.

[6]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[7]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[8]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[9]  Dong Xu,et al.  Action recognition using context and appearance distribution features , 2011, CVPR 2011.

[10]  Jitendra Malik,et al.  Recognizing Objects in Range Data Using Regional Point Descriptors , 2004, ECCV.

[11]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Jiebo Luo,et al.  Recognizing realistic actions from videos “in the wild” , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Ling Shao,et al.  Projected Orthogonal Shape Contexts for Human Action Description and Categorization , 2009 .

[14]  Xinghua Sun,et al.  Action recognition via local descriptors and holistic features , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[15]  Marcel Körtgen,et al.  3D Shape Matching with 3D Shape Contexts , 2003 .

[16]  Sebastian Nowozin,et al.  Combining appearance and motion for human action classification in videos , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[17]  Nazli Ikizler-Cinbis,et al.  Object, Scene and Actions: Combining Multiple Features for Human Action Recognition , 2010, ECCV.

[18]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[19]  Franziska Meier,et al.  3D Shape Context and Distance Transform for action recognition , 2008, 2008 19th International Conference on Pattern Recognition.

[20]  Jorge Dias,et al.  Active Exploration Using Bayesian Models for Multimodal Perception , 2008, ICIAR.

[21]  Mubarak Shah,et al.  Recognizing human actions using multiple features , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Shaogang Gong,et al.  Fusing appearance and distribution information of interest points for action recognition , 2012, Pattern Recognit..

[23]  Juan Carlos Niebles,et al.  A Hierarchical Model of Shape and Appearance for Human Action Classification , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories , 2006 .

[25]  Ling Shao,et al.  Transform based spatio-temporal descriptors for human action recognition , 2011, Neurocomputing.

[26]  Liang-Tien Chia,et al.  Motion Context: A New Representation for Human Action Recognition , 2008, ECCV.

[27]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..