论文信息 - High-Speed Action Recognition and Localization in Compressed Domain Videos

High-Speed Action Recognition and Localization in Compressed Domain Videos

We present a compressed domain scheme that is able to recognize and localize actions at high speeds. The recognition problem is posed as performing an action video query on a test video sequence. Our method is based on computing motion similarity using compressed domain features which can be extracted with low complexity. We introduce a novel motion correlation measure that takes into account differences in motion directions and magnitudes. Our method is appearance-invariant, requires no prior segmentation, alignment or stabilization, and is able to localize actions in both space and time. We evaluated our method on a benchmark action video database consisting of six actions performed by 25 people under three different scenarios. Our proposed method achieved a classification accuracy of 90%, comparing favorably with existing methods in action classification accuracy, and is able to localize a template video of 80 x 64 pixels with 23 frames in a test video of 368 x 184 pixels with 835 frames in just 11 s, easily outperforming other methods in localization speed. We also perform a systematic investigation of the effects of various encoding options on our proposed approach. In particular, we present results on the compression-classification tradeoff, which would provide valuable insight into jointly designing a system that performs video encoding at the camera front-end and action classification at the processing back-end.

[1] Serge J. Belongie,et al. Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[2] Robert B. Fisher,et al. Hidden Markov Models for Optical Flow Analysis in Crowds , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[3] Gary J. Sullivan,et al. Rate-constrained coder control and comparison of video coding standards , 2003, IEEE Trans. Circuits Syst. Video Technol..

[4] Ajay Luthra,et al. Overview of the H.264/AVC video coding standard , 2003, SPIE Optics + Photonics.

[5] James W. Davis,et al. The representation and recognition of human movement using temporal templates , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6] Shih-Fu Chang,et al. Compressed-domain techniques for image/video indexing and manipulation , 1995, Proceedings., International Conference on Image Processing.

[7] Jitendra Malik,et al. Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[8] Ajay Luthra,et al. Overview of the H.264/AVC video coding standard , 2003, IEEE Trans. Circuits Syst. Video Technol..

[9] S. Shankar Sastry,et al. Compressed Domain Real-time Action Recognition , 2006, 2006 IEEE Workshop on Multimedia Signal Processing.

[10] James W. Davis,et al. The Representation and Recognition of Action Using Temporal Templates , 1997, CVPR 1997.

[11] S. Shankar Sastry,et al. Unsupervised Discovery of Action Hierarchies in Large Collections of Activity Videos , 2007, 2007 IEEE 9th Workshop on Multimedia Signal Processing.

[12] Bo Shen,et al. Compressed-Domain Video Processing , 2002 .

[13] M. Davies,et al. Approximating optical flow within the MPEG-2 compressed domain , 2005 .

[14] Thomas Wedi,et al. Motion- and aliasing-compensated prediction for hybrid video coding , 2003, IEEE Trans. Circuits Syst. Video Technol..

[15] Faouzi Kossentini,et al. H.263+: video coding at low bit rates , 1998, IEEE Trans. Circuits Syst. Video Technol..

[16] R. Venkatesh Babu,et al. Compressed domain action classification using HMM , 2002, Pattern Recognit. Lett..

[17] R. L. Baker,et al. Rate-distortion optimized motion compensation for video compression using fixed or variable size blocks , 1991, IEEE Global Telecommunications Conference GLOBECOM '91: Countdown to the New Millennium. Conference Record.

[18] Jake K. Aggarwal,et al. Recognition of Composite Human Activities through Context-Free Grammar Based Representation , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[19] Jake K. Aggarwal,et al. Human Motion Analysis: A Review , 1999, Comput. Vis. Image Underst..

[20] Gary J. Sullivan,et al. Rate-distortion optimization for video compression , 1998, IEEE Signal Process. Mag..

[21] Ivan Laptev,et al. On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[22] Barbara Caputo,et al. Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[23] David J. Fleet,et al. Performance of optical flow techniques , 1994, International Journal of Computer Vision.

[24] Wayne H. Wolf,et al. Human activity detection in MPEG sequences , 2000, Proceedings Workshop on Human Motion.

[25] Alan N. Willson,et al. Rate-distortion optimal motion estimation algorithms for motion-compensated transform video coding , 1998, IEEE Trans. Circuits Syst. Video Technol..

[26] Martial Hebert,et al. Efficient visual event detection using volumetric features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[27] Didier Le Gall,et al. MPEG: a video compression standard for multimedia applications , 1991, CACM.

[28] Eli Shechtman,et al. Space-time behavior based correlation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[29] Mubarak Shah,et al. Recognizing human actions in videos acquired by uncalibrated moving cameras , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.