On the Effects of Low Video Quality in Human Action Recognition

Human activity recognition is one of the most intensively studied areas of computer vision and pattern recognition in recent years. A wide variety of approaches have shown to work well against challenging image variations such as appearance, pose and illumination. However, the problem of low video quality remains an unexplored and challenging issue in real-world applications. In this paper, we investigate the effects of low video quality in human action recognition from two perspectives: videos that are poorly sampled spatially (low resolution) and temporally (low frame rate), and compressed videos affected by motion blurring and artifacts. In order to increase the robustness of feature representation under these conditions, we propose the usage of textural features to complement the popular shape and motion features. Extensive experiments were carried out on two well-known benchmark datasets of contrasting nature: the classic KTH dataset and the large-scale HMDB51 dataset. Results obtained with two popular representation schemes (Bag-of-Words, Fisher Vectors) further validate the effectiveness of the proposed approach.

[1]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[2]  Yiannis Kompatsiaris,et al.  Proceedings of the ACM International Conference on Image and Video Retrieval , 2009, CIVR 2009.

[3]  Larry S. Davis,et al.  AVSS 2011 demo session: A large-scale benchmark dataset for event recognition in surveillance video , 2011, AVSS.

[4]  Roland Göcke,et al.  On the Effect of Human Body Parts in Large Scale Human Behaviour Recognition , 2013, 2013 International Conference on Digital Image Computing: Techniques and Applications (DICTA).

[5]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[6]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Matti Pietikäinen,et al.  Efficient Image Appearance Description Using Dense Sampling Based Local Binary Patterns , 2012, ACCV.

[8]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[9]  Matti Pietikäinen,et al.  Face Description with Local Binary Patterns: Application to Face Recognition , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Andrew Zisserman,et al.  The devil is in the details: an evaluation of recent feature encoding methods , 2011, BMVC.

[11]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[12]  John See,et al.  Action recognition in low quality videos by jointly using shape, motion and texture features , 2015, 2015 IEEE International Conference on Signal and Image Processing Applications (ICSIPA).

[13]  Andrew Zisserman,et al.  Efficient additive kernels via explicit feature maps , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[15]  Lior Wolf,et al.  Local Trinary Patterns for human action recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[16]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[18]  Matti Pietikäinen,et al.  Human Activity Recognition Using a Dynamic Texture Based Method , 2008, BMVC.

[19]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories , 2006 .

[20]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Ling Shao,et al.  Human Action Recognition Using LBP-TOP as Sparse Spatio-Temporal Feature Descriptor , 2009, CAIP.

[22]  Antonio Fernández-Caballero,et al.  A survey of video datasets for human action and activity recognition , 2013, Comput. Vis. Image Underst..

[23]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[24]  Cordelia Schmid,et al.  Towards Understanding Action Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[25]  Cordelia Schmid,et al.  Action and Event Recognition with Fisher Vectors on a Compact Feature Set , 2013, 2013 IEEE International Conference on Computer Vision.

[26]  Limin Wang,et al.  A Comparative Study of Encoding, Pooling and Normalization Methods for Action Recognition , 2012, ACCV.

[27]  Matti Pietikäinen,et al.  Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Thomas Serre,et al.  HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[29]  Ling Shao,et al.  Feature detector and descriptor evaluation in human action recognition , 2010, CIVR '10.

[30]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[31]  Christian Wolf,et al.  Sequential Deep Learning for Human Action Recognition , 2011, HBU.