Discriminative Action States Discovery for Online Action Recognition

In this paper, we provide an approach for online human action recognition, where the videos are represented by frame-level descriptors. To address the large intraclass variations of frame-level descriptors, we propose an action states discovery method to discover the different distributions of frame-level descriptors while training a classifier. A positive sample set is treated as multiple clusters called action states. The action states model can be effectively learned by clustering the positive samples and optimizing the decision boundary of each state simultaneously. Experimental results show that our method not only outperforms the state-of-the-art methods, but also can predict the video by an on-going process with a real-time speed.

[1]  Ehud Rivlin,et al.  Online action recognition using covariance of shape and motion , 2014, Comput. Vis. Image Underst..

[2]  Jake K. Aggarwal,et al.  An Overview of Contest on Semantic Description of Human Activities (SDHA) 2010 , 2010, ICPR Contests.

[3]  Gang Yu,et al.  Discriminative Orderlet Mining for Real-Time Recognition of Human-Object Interaction , 2014, ACCV.

[4]  Gang Yu,et al.  Propagative Hough Voting for Human Activity Detection and Recognition , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[5]  François Brémond,et al.  Video Covariance Matrix Logarithm for Human Action Recognition in Videos , 2015, IJCAI.

[6]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[7]  Andrew Zisserman,et al.  Discriminative Sub-categorization , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[9]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[10]  Joseph J. LaViola,et al.  Exploring the Trade-off Between Accuracy and Observational Latency in Action Recognition , 2013, International Journal of Computer Vision.

[11]  Cewu Lu,et al.  Abnormal Event Detection at 150 FPS in MATLAB , 2013, 2013 IEEE International Conference on Computer Vision.

[12]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[13]  Sheng Tang,et al.  Localized Multiple Kernel Learning for Realistic Human Action Recognition in Videos , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[14]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[15]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[16]  Junsong Yuan,et al.  Abnormal event detection in crowded scenes using sparse representation , 2013, Pattern Recognit..

[17]  Gang Yu,et al.  Predicting human activities using spatio-temporal structure of interest points , 2012, ACM Multimedia.

[18]  Ramakant Nevatia,et al.  ACTIVE: Activity Concept Transitions in Video Event Classification , 2013, 2013 IEEE International Conference on Computer Vision.

[19]  Michael S. Ryoo,et al.  Human activity prediction: Early recognition of ongoing activities from streaming videos , 2011, 2011 International Conference on Computer Vision.

[20]  Rama Chellappa,et al.  PADS: A Probabilistic Activity Detection Framework for Video Data , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Tinne Tuytelaars,et al.  Modeling video evolution for action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Stan Sclaroff,et al.  Space-time tree ensemble for action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Sven J. Dickinson,et al.  Recognize Human Activities from Partially Observed Videos , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Robin R. Murphy,et al.  Human-Robot Interaction , 2012 .

[25]  Limin Wang,et al.  Action recognition with trajectory-pooled deep-convolutional descriptors , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[27]  Yan Song,et al.  Describing Trajectory of Surface Patch for Human Action Recognition on RGB and Depth Videos , 2015, IEEE Signal Processing Letters.

[28]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[29]  Iasonas Kokkinos,et al.  Discovering discriminative action parts from mid-level video representations , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Yang Wang,et al.  Recognizing human actions from still images with latent poses , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.