Video Classification Based on Spatial Gradient and Optical Flow Descriptors

Feature point detection and local feature extraction are the two critical steps in trajectory-based methods for video classification. This paper proposes to detect trajectories by tracking the spatiotemporal feature points in salient regions instead of the entire frame. This strategy significantly reduces noisy feature points in the background region, and leads to lower computational cost and higher discriminative power of the feature set. Two new spatiotemporal descriptors, namely the STOH and RISTOH are proposed to describe the spatiotemporal characteristics of the moving object. The proposed method for feature point detection and local feature extraction is applied for human action recognition. It is evaluated on three video datasets: KTH, YouTube, and Hollywood2. The results show that the proposed method achieves a higher classification rate, even when it uses only half the number of feature points compared to the dense sampling approach. Moreover, features extracted from the curvature of the motion surface are more discriminative than features extracted from the spatial gradient.

[1]  Christopher Joseph Pal,et al.  Activity recognition using the velocity histories of tracked keypoints , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[2]  Yu Qiao,et al.  Exploring Motion Boundary based Sampling and Spatial-Temporal Context Descriptors for Action Recognition , 2013, BMVC.

[3]  Chang-Su Kim,et al.  Spatiotemporal Saliency Detection for Video Sequences Based on Random Walk With Restart , 2015, IEEE Transactions on Image Processing.

[4]  Sotirios P. Chatzis,et al.  A Nonparametric Bayesian Approach toward Stacked Convolutional Independent Component Analysis , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[5]  Cordelia Schmid,et al.  Dense Trajectories and Motion Boundary Descriptors for Action Recognition , 2013, International Journal of Computer Vision.

[6]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[7]  Cordelia Schmid,et al.  Actions in context , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[9]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[10]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Pascal Fua,et al.  SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[13]  Xi Wang,et al.  Modeling Spatial-Temporal Clues in a Hybrid Deep Learning Framework for Video Classification , 2015, ACM Multimedia.

[14]  Jie Yang,et al.  Graph Construction for Salient Object Detection in Videos , 2014, 2014 22nd International Conference on Pattern Recognition.

[15]  Loong Fah Cheong,et al.  Block-Sparse RPCA for Salient Motion Detection , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Huchuan Lu,et al.  Saliency Detection via Graph-Based Manifold Ranking , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Cordelia Schmid,et al.  Actions in context , 2009, CVPR.

[18]  Shi-Min Hu,et al.  Global contrast based salient region detection , 2011, CVPR 2011.

[19]  Cordelia Schmid,et al.  Human Detection Using Oriented Histograms of Flow and Appearance , 2006, ECCV.

[20]  Nanning Zheng,et al.  Learning to Detect a Salient Object , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Chang-Su Kim,et al.  Spatiotemporal saliency detection for video sequences based on random walk with restart. , 2015, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.

[22]  Martial Hebert,et al.  Trajectons: Action recognition through the motion analysis of tracked features , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[23]  Jiebo Luo,et al.  Recognizing realistic actions from videos “in the wild” , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Quoc V. Le,et al.  Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis , 2011, CVPR 2011.

[25]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[26]  Nuno Vasconcelos,et al.  Spatiotemporal Saliency in Dynamic Scenes , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Sergio A. Velastin,et al.  3D Extended Histogram of Oriented Gradients (3DHOG) for Classification of Road Users in Urban Scenes , 2009, BMVC.

[28]  Yael Pritch,et al.  Saliency filters: Contrast based filtering for salient region detection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Lihi Zelnik-Manor,et al.  Context-aware saliency detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[30]  Jintao Li,et al.  Hierarchical spatio-temporal context modeling for action recognition , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Chun-Rong Huang,et al.  Video Saliency Map Detection by Dominant Camera Motion Removal , 2014, IEEE Transactions on Circuits and Systems for Video Technology.