Action recognition using salient neighboring histograms

Combining spatio-temporal interest points with Bag-of-Words models achieves state-of-the-art performance in action recognition. However, existing methods based on “bag-of-words” models either are too local to capture the variance in space/time or fail to solve the ambiguity problem in spatial and temporal dimensions. Instead, we propose a salient vocabulary construction algorithm to select visual words from a global point of view, and form compact descriptors to represent discriminative histograms in the neighborhoods. Those salient neighboring histograms are then trained to model different actions. Our approach yields a competitive result on the KTH dataset compare to state-of-the-art methods. On the more challenging UCF Sports dataset, we obtain 95.21%, which is approximately 4% better than the current best published results.

[1]  S. Gong,et al.  Recognising action as clouds of space-time interest points , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[3]  Dong Xu,et al.  Action recognition using context and appearance distribution features , 2011, CVPR 2011.

[4]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[5]  Jake K. Aggarwal,et al.  Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[6]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[7]  Bruce A. Draper,et al.  Scalable action recognition with a subspace forest , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Tsuhan Chen,et al.  Spatio-Temporal Phrases for Activity Recognition , 2012, ECCV.

[9]  A. Fuller,et al.  Stability of Motion , 1976, IEEE Transactions on Systems, Man, and Cybernetics.

[10]  Andrew Gilbert,et al.  Fast realistic multi-action recognition using mined dense spatio-temporal features , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[11]  Adriana Kovashka,et al.  Learning a hierarchy of discriminative space-time neighborhood features for human action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Cordelia Schmid,et al.  Actions in context , 2009, CVPR.

[13]  Christopher Joseph Pal,et al.  Activity recognition using the velocity histories of tracked keypoints , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[14]  Jiebo Luo,et al.  Recognizing realistic actions from videos “in the wild” , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[16]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[17]  Mubarak Shah,et al.  Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.

[19]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[20]  Thomas B. Moeslund,et al.  Selective spatio-temporal interest points , 2012, Comput. Vis. Image Underst..

[21]  Cordelia Schmid,et al.  Will person detection help bag-of-features action recognition? , 2010 .

[22]  Rama Chellappa,et al.  Machine Recognition of Human Activities: A Survey , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[23]  Jintao Li,et al.  Hierarchical spatio-temporal context modeling for action recognition , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Solomon Lefschetz,et al.  Stability by Liapunov's Direct Method With Applications , 1962 .

[25]  Subhransu Maji,et al.  Classification using intersection kernel support vector machines is efficient , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[27]  Cordelia Schmid,et al.  Actions in context , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Manik Varma,et al.  More generality in efficient multiple kernel learning , 2009, ICML '09.

[29]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.