论文信息 - Time-ordered spatial-temporal interest points for human action classification

Time-ordered spatial-temporal interest points for human action classification

Human action classification, which is vital for content-based video retrieval and human-machine interaction, finds problem in distinguishing similar actions. Previous works typically detect spatial-temporal interest points (STIPs) from action sequences and then adopt bag-of-visual words (BoVW) model to describe actions as numerical statistics of STIPs. Despite the robustness of BoVW, this model ignores the spatial-temporal layout of STIPs, leading to misclassification among different types of actions with similar numerical statistics of STIPs. Motivated by this, a time-ordered feature is designed to describe the temporal distribution of STIPs, which contains complementary structural information to traditional BoVW model. Moreover, a temporal refinement method is used to eliminate intra-variations among time-ordered features caused by performers' habits. Then a time-ordered BoVW model is built to represent actions, which encodes both numerical statistics and temporal distribution of STIPs. Extensive experiments on three challenging datasets, i.e., KTH, Rochster and UT-Interaction, validate the effectiveness of our method in distinguishing similar actions.

Hong Liu | Chen Chen | Mengyuan Liu

[1] Zicheng Liu,et al. Cross-dataset action detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2] Jake K. Aggarwal,et al. An Overview of Contest on Semantic Description of Human Activities (SDHA) 2010 , 2010, ICPR Contests.

[3] Hong Liu,et al. Learning directional co-occurrence for human action classification , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4] Hong Liu,et al. Enhanced skeleton visualization for view invariant human action recognition , 2017, Pattern Recognit..

[5] Christopher Joseph Pal,et al. Activity recognition using the velocity histories of tracked keypoints , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[6] Hong Liu,et al. Action classification by exploring directional co-occurrence of weighted stips , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[7] Slawomir Bak,et al. Relative dense tracklets for human action recognition , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[8] Cordelia Schmid,et al. Dense Trajectories and Motion Boundary Descriptors for Action Recognition , 2013, International Journal of Computer Vision.

[9] Gang Yu,et al. Propagative Hough Voting for Human Activity Detection and Recognition , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[10] Guo-Jun Qi,et al. Differential Recurrent Neural Networks for Action Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11] Nasser Kehtarnavaz,et al. Improving Human Action Recognition Using Fusion of Depth Camera and Inertial Sensors , 2015, IEEE Transactions on Human-Machine Systems.

[12] Cordelia Schmid,et al. Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[13] B. Silverman. Density estimation for statistics and data analysis , 1986 .

[14] Sebastian Nowozin,et al. Discriminative Subsequence Mining for Action Classification , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[15] Kuldip K. Paliwal,et al. Fast principal component analysis using fixed-point algorithm , 2007, Pattern Recognit. Lett..

[16] Juan Carlos Niebles,et al. Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification , 2010, ECCV.

[17] Samsu Sempena,et al. Human action recognition using Dynamic Time Warping , 2011, Proceedings of the 2011 International Conference on Electrical Engineering and Informatics.

[18] C. D. Kemp,et al. Density Estimation for Statistics and Data Analysis , 1987 .

[19] Nasser Kehtarnavaz,et al. A Real-Time Human Action Recognition System Using Depth and Inertial Sensor Fusion , 2016, IEEE Sensors Journal.

[20] Juan Carlos Niebles,et al. Unsupervised Learning of Human Action Categories , 2006 .

[21] Donald J. Berndt,et al. Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[22] Barbara Caputo,et al. Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[23] Gang Yu,et al. Predicting human activities using spatio-temporal structure of interest points , 2012, ACM Multimedia.

[24] Hong Liu,et al. 3D Action Recognition Using Multiscale Energy-Based Global Ternary Image , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[25] Hong Liu,et al. Depth Context: a new descriptor for human activity recognition by using sole depth sequences , 2016, Neurocomputing.

[26] Florian Baumann,et al. Recognizing human actions using novel space-time volume binary patterns , 2016, Neurocomputing.

[27] Fei-Fei Li,et al. Learning latent temporal structure for complex event detection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[28] François Brémond,et al. Contextual Statistics of Space-Time Ordered Features for Human Action Recognition , 2012, 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance.

[29] Nicu Sebe,et al. Cluster encoding for modelling temporal variation in video , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[30] Jake K. Aggarwal,et al. Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[31] Jean-Michel Jolion,et al. Pairwise Features for Human Action Recognition , 2010, 2010 20th International Conference on Pattern Recognition.

[32] Bo Gao,et al. A discriminative key pose sequence model for recognizing human interactions , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[33] Nasser Kehtarnavaz,et al. A survey of depth and inertial sensor fusion for human action recognition , 2015, Multimedia Tools and Applications.

[34] Andrew Zisserman,et al. Efficient Additive Kernels via Explicit Feature Maps , 2012, IEEE Trans. Pattern Anal. Mach. Intell..