Human action recognition based on boosted feature selection and naive Bayes nearest-neighbor classification

In this paper we propose a method of feature selection using the AdaBoost algorithm for action recognition. Instead of detecting spatio-temporal interest points and using a 'bag of features' approach, we use densely sampled descriptors, either 3D-SIFT or 3D-HOG, and select the most discriminative subset using the AdaBoost algorithm. We obtain maximal accuracy with just 200 of the 3217 possible raw 3D features from each video sequence. Using the extremely simple naive Bayes nearest-neighbor (NBNN) classifier with the most discriminative 3D-SIFT features, we obtain accuracies of: 92.7%, 99.4%, 92.3% and 38.1% on the KTH, Weizmann, IXMAS and HMDB51 datasets, respectively. We also observe that the errors are reasonably equitably distributed across the different action classes.

[1]  James W. Davis Recognizing Movement using Motion Histograms , 1999 .

[2]  Ming Liu,et al.  Hierarchical Space-Time Model Enabling Efficient Search for Human Actions , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[3]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[4]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[5]  Dacheng Tao,et al.  Slow Feature Analysis for Human Action Recognition , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Chong Wang,et al.  The IBP Compound Dirichlet Process and its Application to Focused Topic Modeling , 2010, ICML.

[7]  Mubarak Shah,et al.  Learning human actions via information maximization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Shaogang Gong,et al.  Facial expression recognition based on Local Binary Patterns: A comprehensive study , 2009, Image Vis. Comput..

[9]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[10]  Julien Mairal,et al.  Proximal Methods for Hierarchical Sparse Coding , 2010, J. Mach. Learn. Res..

[11]  Ramakant Nevatia,et al.  Recognition and Segmentation of 3-D Human Action Using HMM and Multi-class AdaBoost , 2006, ECCV.

[12]  James W. Davis Hierarchical motion history images for recognizing human motion , 2001, Proceedings IEEE Workshop on Detection and Recognition of Events in Video.

[13]  M. Pietikäinen,et al.  Facial expression recognition based on local binary patterns , 2007, Pattern Recognition and Image Analysis.

[14]  Hong Wei,et al.  Face Verification Using GaborWavelets and AdaBoost , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[15]  Thomas S. Huang,et al.  Supervised translation-invariant sparse coding , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Luc Van Gool,et al.  Action snippets: How many frames does human action recognition require? , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  David M. Blei,et al.  Hierarchical relational models for document networks , 2009, 0909.4331.

[18]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[19]  Eli Shechtman,et al.  In defense of Nearest-Neighbor based image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Francis R. Bach,et al.  Structured Sparse Principal Component Analysis , 2009, AISTATS.

[21]  Manik Varma,et al.  More generality in efficient multiple kernel learning , 2009, ICML '09.

[22]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[23]  Rémi Ronfard,et al.  Action Recognition from Arbitrary Views using 3D Exemplars , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[24]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Patrick Pérez,et al.  Retrieving actions in movies , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[26]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories , 2006 .

[27]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[28]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[29]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[30]  Michael Werman,et al.  The Quadratic-Chi Histogram Distance Family , 2010, ECCV.

[31]  Andreas Zell,et al.  Combining Adaboost learning and evolutionary search to select features for real-time object detection , 2004, Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No.04TH8753).

[32]  Guillermo Sapiro,et al.  Supervised Dictionary Learning , 2008, NIPS.

[33]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Julien Mairal,et al.  Proximal Methods for Sparse Hierarchical Dictionary Learning , 2010, ICML.

[35]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[36]  Tetsuya Takiguchi,et al.  3D human posture estimation using the HOG features from monocular image , 2008, 2008 19th International Conference on Pattern Recognition.

[37]  Xuelong Li,et al.  Efficient HOG human detection , 2011, Signal Process..

[38]  Kilian Q. Weinberger,et al.  Large Margin Multi-Task Metric Learning , 2010, NIPS.

[39]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[40]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[41]  Ronen Basri,et al.  Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Xihong Wu,et al.  Boosting Local Binary Pattern (LBP)-Based Face Recognition , 2004, SINOBIOMETRICS.

[43]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[44]  Greg Mori,et al.  Action recognition by learning mid-level motion features , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[46]  Mohiuddin Ahmad,et al.  HMM-based Human Action Recognition Using Multiview Image Sequences , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[47]  Thomas Serre,et al.  HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[48]  Dong Xu,et al.  Action recognition using context and appearance distribution features , 2011, CVPR 2011.

[49]  Shamik Sural,et al.  Gait recognition using Pose Kinematics and Pose Energy Image , 2012, Signal Process..

[50]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[51]  Yang Wang,et al.  Human Action Recognition by Semilatent Topic Models , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[53]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[54]  Jian Zhang,et al.  Fast Pedestrian Detection Using a Cascade of Boosted Covariance Features , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[55]  Thomas Serre,et al.  A Biologically Inspired System for Action Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[56]  Ling Shao,et al.  Feature detector and descriptor evaluation in human action recognition , 2010, CIVR '10.