Recognizing human actions using a new descriptor based on spatial-temporal interest points and weighted-output classifier

Abstract The bag of interest points (BIPs) approach is a good strategy for human action recognition, but it ignores much information contained in the spatial–temporal interest points (STIPs), while the lost information is helpful for classification. In this paper, a new action descriptor based on the STIPs is proposed: histogram of interest point locations (HIPLs). HIPL reorganizes STIPs and reflects the spatial location information, and can be viewed as a useful supplement to the BIP feature. Multiple features including BIP and HIPL are extracted to describe human actions, however, it leads to over-fitting easily by combining them directly because the dimension of feature vector is too high. To overcome this problem, a novel classifier combination framework is developed to integrate the multiple features, and AdaBoost and sparse representation (SR) are used as basic algorithms. Experiments on KTH and UCF sports datasets which are two benchmarks in human action recognition, show that our results are either comparable to, or significantly better than previously published results on these benchmarks.

[1]  Ahmed M. Elgammal,et al.  Information Theoretic Key Frame Selection for Action Recognition , 2008, BMVC.

[2]  Ling Shao,et al.  Spatio-temporal shape contexts for human action retrieval , 2009, IMCE '09.

[3]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Pinar Duygulu Sahin,et al.  Pose sentences: A new representation for action recognition using sequence of pose words , 2008, 2008 19th International Conference on Pattern Recognition.

[5]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[6]  Mubarak Shah,et al.  Recognizing human actions using multiple features , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Emmanuel Guigon Object , 1962, Definitions.

[8]  Y. Freund,et al.  Discussion of the Paper \additive Logistic Regression: a Statistical View of Boosting" By , 2000 .

[9]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[10]  Patrick Pérez,et al.  Retrieving actions in movies , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[11]  Tieniu Tan,et al.  Human Behavior Analysis Based on a New Motion Descriptor , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[12]  Ling Shao,et al.  Human action segmentation and recognition via motion and shape analysis , 2012, Pattern Recognit. Lett..

[13]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[14]  Ronen Basri,et al.  Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Lior Wolf,et al.  Local Trinary Patterns for human action recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[17]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[18]  Ayoub Al-Hamadi,et al.  Toward Robust Action Retrieval in Video , 2010, BMVC.

[19]  Rémi Ronfard,et al.  A survey of vision-based methods for action representation, segmentation and recognition , 2011, Comput. Vis. Image Underst..

[20]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories , 2006 .

[21]  Martial Hebert,et al.  Representing Pairwise Spatial and Temporal Relations for Action Recognition , 2010, ECCV.

[22]  Václav Hlavác,et al.  Pose primitive based human action recognition in videos or still images , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Yang Wang,et al.  Recognizing human actions from still images with latent poses , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[24]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[25]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[26]  Ling Shao,et al.  A set of co-occurrence matrices on the intrinsic manifold of human silhouettes for action recognition , 2010, CIVR '10.

[27]  Yang Wang,et al.  Human Action Recognition by Semilatent Topic Models , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Zhenguo Li,et al.  Modeling Scene and Object Contexts for Human Action Retrieval With Few Examples , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[29]  Yupin Luo,et al.  Action recognition via cumulative histogram of multiple features , 2011 .

[30]  Shaogang Gong,et al.  Action categorization with modified hidden conditional random field , 2010, Pattern Recognit..

[31]  Thomas S. Huang,et al.  Image Super-Resolution Via Sparse Representation , 2010, IEEE Transactions on Image Processing.

[32]  Yupin Luo,et al.  Making full use of spatial-temporal interest points: An AdaBoost approach for action recognition , 2010, 2010 IEEE International Conference on Image Processing.

[33]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[34]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[35]  Silvio Savarese,et al.  Recognizing human actions by attributes , 2011, CVPR 2011.

[36]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[37]  Ling Shao,et al.  A Wavelet Based Local Descriptor for Human Action Recognition , 2010, BMVC.

[38]  Paul A. Viola,et al.  Boosting Image Retrieval , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[39]  Luc Van Gool,et al.  Action snippets: How many frames does human action recognition require? , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Mubarak Shah,et al.  Learning human actions via information maximization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  William Brendel,et al.  Activities as Time Series of Human Postures , 2010, ECCV.

[42]  Mubarak Shah,et al.  Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Xiaogang Wang,et al.  Random sampling LDA for face recognition , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[44]  Ling Shao,et al.  Transform based spatio-temporal descriptors for human action recognition , 2011, Neurocomputing.

[45]  Ling Shao,et al.  Histogram of Body Poses and Spectral Regression Discriminant Analysis for Human Action Categorization , 2010, BMVC.

[46]  Keiichi Kemmotsu,et al.  Recognizing human behaviors with vision sensors in a network robot system , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[47]  Ying Wu,et al.  Action recognition with multiscale spatio-temporal contexts , 2011, CVPR 2011.

[48]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[49]  Shaogang Gong,et al.  Recognising action as clouds of space-time interest points , 2009, CVPR.

[50]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[51]  Nazli Ikizler-Cinbis,et al.  Object, Scene and Actions: Combining Multiple Features for Human Action Recognition , 2010, ECCV.

[52]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[53]  Adriana Kovashka,et al.  Learning a hierarchy of discriminative space-time neighborhood features for human action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[54]  Ling Shao,et al.  Feature detector and descriptor evaluation in human action recognition , 2010, CIVR '10.

[55]  Cordelia Schmid,et al.  Actions in context , 2009, CVPR.

[56]  Pinar Duygulu Sahin,et al.  Histogram of oriented rectangles: A new pose descriptor for human action recognition , 2009, Image Vis. Comput..

[57]  Andrew Zisserman,et al.  Representing shape with a spatial pyramid kernel , 2007, CIVR '07.

[58]  Daniel Boley,et al.  Human motion recognition using support vector machines , 2009, Comput. Vis. Image Underst..

[59]  Xinghua Sun,et al.  Action recognition via local descriptors and holistic features , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[60]  Jean Ponce,et al.  Automatic annotation of human actions in video , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[61]  Ze-Nian Li BEYOND ACTIONS : DISCRIMINATIVE MODELS FOR CONTEXTUAL GROUP ACTIVITIES , 2010 .

[62]  Tieniu Tan,et al.  Recent developments in human motion analysis , 2003, Pattern Recognit..

[63]  Pong C. Yuen,et al.  Human action recognition using boosted EigenActions , 2010, Image Vis. Comput..