Exploiting textures for better action recognition in low-quality videos

Human action recognition is an increasingly matured field of study in the recent years, owing to its widespread use in various applications. A number of related research problems, such as feature representations, human pose and body parts detection, and scene/object context, are being actively studied. However, the general problem of video quality—a realistic issue in the face of low-cost surveillance infrastructure and mobile devices, has not been systematically investigated from various aspects. In this paper, we address the problem of action recognition in low-quality videos from a myriad of perspectives: spatial and temporal downsampling, video compression, and the presence of motion blurring and compression artifacts. To increase the resilience of feature representation in these type of videos, we propose to use textural features to complement classical shape and motion features. Extensive results were carried out on low-quality versions of three publicly available datasets: KTH, UCF-YouTube, HMDB. Experimental results and analysis suggest that leveraging textural features can significantly improve action recognition performance under low video quality conditions.

[1]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[2]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[3]  Rita Cucchiara,et al.  Video Streaming for Mobile Video Surveillance , 2008, IEEE Transactions on Multimedia.

[4]  Zhen Gao,et al.  Enhancing action recognition in low-resolution videos using dempster-shafer's model , 2016, 2016 IEEE International Conference on Digital Signal Processing (DSP).

[5]  Zhen Wang,et al.  A survey on aggregating methods for action recognition with dense trajectories , 2016, Multimedia Tools and Applications.

[6]  Andrew Zisserman,et al.  Efficient Additive Kernels via Explicit Feature Maps , 2012, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Ajay Luthra,et al.  Overview of the H.264/AVC video coding standard , 2003, IEEE Trans. Circuits Syst. Video Technol..

[8]  Ah Chung Tsoi,et al.  Investigating the impact of frame rate towards robust human action recognition , 2016, Signal Process..

[9]  Gunnar Farnebäck,et al.  Two-Frame Motion Estimation Based on Polynomial Expansion , 2003, SCIA.

[10]  John See,et al.  Action recognition in low quality videos by jointly using shape, motion and texture features , 2015, 2015 IEEE International Conference on Signal and Image Processing Applications (ICSIPA).

[11]  Limin Wang,et al.  A Comparative Study of Encoding, Pooling and Normalization Methods for Action Recognition , 2012, ACCV.

[12]  Matti Pietikäinen,et al.  Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Jenq-Neng Hwang,et al.  A Review on Video-Based Human Activity Recognition , 2013, Comput..

[14]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[15]  Cordelia Schmid,et al.  Dense Trajectories and Motion Boundary Descriptors for Action Recognition , 2013, International Journal of Computer Vision.

[16]  Matti Pietikäinen,et al.  Recognition of human actions using texture descriptors , 2011, Machine Vision and Applications.

[17]  J. Aggarwal,et al.  Recognizing human action from a far field of view , 2009, 2009 Workshop on Motion and Video Computing (WMVC).

[18]  Thomas Serre,et al.  HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[19]  Jiebo Luo,et al.  Recognizing realistic actions from videos , 2009, CVPR.

[20]  Ling Shao,et al.  Human Action Recognition Using LBP-TOP as Sparse Spatio-Temporal Feature Descriptor , 2009, CAIP.

[21]  Joo Kooi Tan,et al.  Histogram of DMHI and LBP images to represent human actions , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[22]  Esa Rahtu,et al.  BSIF: Binarized statistical image features , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[23]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[24]  Soharab Hossain Shaikh,et al.  A comprehensive survey of human action recognition with spatio-temporal interest point (STIP) detector , 2015, The Visual Computer.

[25]  Di Xiao,et al.  An efficient and noise resistive selective image encryption scheme for gray images based on chaotic maps and DNA complementary rules , 2014, Multimedia Tools and Applications.

[26]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[27]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[28]  Matti Pietikäinen,et al.  Face Description with Local Binary Patterns: Application to Face Recognition , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[30]  A. G. Amitha Perera,et al.  Human Action Recognition in Large-Scale Datasets Using Histogram of Spatiotemporal Gradients , 2012, 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance.

[31]  Yongdong Zhang,et al.  Efficient Parallel Framework for HEVC Motion Estimation on Many-Core Processors , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[32]  Lior Wolf,et al.  Local Trinary Patterns for human action recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[33]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Esa Rahtu,et al.  Volume Local Phase Quantization for Blur-Insensitive Dynamic Texture Classification , 2011, SCIA.

[35]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[36]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[37]  John See,et al.  On the Effects of Low Video Quality in Human Action Recognition , 2015, 2015 International Conference on Digital Image Computing: Techniques and Applications (DICTA).

[38]  John See,et al.  Deep CNN object features for improved action recognition in low quality videos , 2016, IEEE CSE 2016.

[39]  Luc Van Gool,et al.  An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector , 2008, ECCV.

[40]  Nadia Magnenat-Thalmann Welcome to the year 2016 , 2015, The Visual Computer.

[41]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[42]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, ICPR 2004.

[43]  Florian Baumann,et al.  Recognizing human actions using novel space-time volume binary patterns , 2016, Neurocomputing.

[44]  Matti Pietikäinen,et al.  Human Activity Recognition Using a Dynamic Texture Based Method , 2008, BMVC.

[45]  Yutaka Satoh,et al.  Evaluation of Vision-Based Human Activity Recognition in Dense Trajectory Framework , 2015, ISVC.

[46]  Erkki Oja,et al.  Independent Component Analysis , 2001 .

[47]  Ville Ojansivu,et al.  Blur Insensitive Texture Classification Using Local Phase Quantization , 2008, ICISP.

[48]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[49]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.