Residue boundary histograms for action recognition in the compressed domain

Traditional action recognition approaches are too slow for real-time or large-scale applications. This problem has been tackled by replacing optical flow with motion vectors from the compressed domain. Yet further usage of compressed domain information for action recognition is possible. Discrete cosine transform (DCT) coefficients, which correspond to residue data, represent information which the block based motion vectors fail to capture. We propose a set of residue boundary histograms (RBH) features for action recognition, separating each DCT block into four parts to obtain four small residue maps and then encoding each residue map by histogram-based descriptors to obtain local features. Experimental results on three challenging datasets show that proposed RBH features improve upon motion vector based features significantly. While more than 100× faster, the results are highly competitive compared with traditional action recognition approaches.

[1]  Cordelia Schmid,et al.  Dense Trajectories and Motion Boundary Descriptors for Action Recognition , 2013, International Journal of Computer Vision.

[2]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[3]  Touradj Ebrahimi,et al.  MPEG-4 natural video coding - An overview , 2000, Signal Process. Image Commun..

[4]  Ajay Luthra,et al.  Overview of the H.264/AVC video coding standard , 2003, IEEE Trans. Circuits Syst. Video Technol..

[5]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[6]  Gary J. Sullivan,et al.  Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[7]  Thomas Serre,et al.  HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[8]  Cordelia Schmid,et al.  Human Detection Using Oriented Histograms of Flow and Appearance , 2006, ECCV.

[9]  Ivan V. Bajic,et al.  Video Object Tracking in the Compressed Domain Using Spatio-Temporal Markov Random Fields , 2013, IEEE Transactions on Image Processing.

[10]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[11]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[12]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[13]  Ajay Luthra,et al.  Overview of the H.264/AVC video coding standard , 2003, SPIE Optics + Photonics.

[14]  Huifang Sun,et al.  Compressed Domain Video Object Segmentation , 2010, IEEE Transactions on Circuits and Systems for Video Technology.

[15]  Ivan Laptev,et al.  Efficient Feature Extraction, Encoding, and Classification for Action Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Mubarak Shah,et al.  Recognizing 50 human action categories of web videos , 2012, Machine Vision and Applications.

[17]  Senior Member,et al.  Robust Background Subtraction for Network Surveillance in H . 264 Streaming Video , 2013 .

[18]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Weisi Lin,et al.  A Video Saliency Detection Model in Compressed Domain , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[20]  Malay Kumar Kundu,et al.  Robust Background Subtraction for Network Surveillance in H.264 Streaming Video , 2013, IEEE Transactions on Circuits and Systems for Video Technology.