Improving Human Action Recognition by Non-action Classification

In this paper we consider the task of recognizing human actions in realistic video where human actions are dominated by irrelevant factors. We first study the benefits of removing non-action video segments, which are the ones that do not portray any human action. We then learn a nonaction classifier and use it to down-weight irrelevant video segments. The non-action classifier is trained using Action-Thread, a dataset with shot-level annotation for the occurrence or absence of a human action. The non-action classifier can be used to identify non-action shots with high precision and subsequently used to improve the performance of action recognition systems.

[1]  Tinne Tuytelaars,et al.  Modeling video evolution for action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[3]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[4]  Cordelia Schmid,et al.  Actions in context , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Alexander Gammerman,et al.  Ridge Regression Learning Algorithm in Dual Variables , 1998, ICML.

[6]  Ming-Syan Chen,et al.  Video Event Detection by Inferring Temporal Instance Labels , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Thomas Deselaers,et al.  What is an object? , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  Richard Bowden,et al.  Hollywood 3D: Recognizing Actions in 3D Natural Scenes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Dimitris Samaras,et al.  Leave-One-Out Kernel Optimization for Shadow Detection , 2018, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Andrew Zisserman,et al.  Thread-Safe: Towards Recognizing Human Actions Across Shot Boundaries , 2014, ACCV.

[11]  Ian D. Reid,et al.  Structured Learning of Human Interactions in TV Shows , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[13]  Ian D. Reid,et al.  High Five: Recognising human interactions in TV shows , 2010, BMVC.

[14]  Thomas Serre,et al.  HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[15]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[16]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[17]  Andrew Zisserman,et al.  Talking Heads: Detecting Humans and Recognizing Their Interactions , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[19]  Horst Bischof,et al.  A Duality Based Approach for Realtime TV-L1 Optical Flow , 2007, DAGM-Symposium.

[20]  Dima Damen,et al.  Proceedings of the British Machine Vision Conference , 2014, BMVC 2014.

[21]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[22]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[23]  Dimitris Samaras,et al.  Noisy Label Recovery for Shadow Detection in Unfamiliar Domains , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Minh Hoai,et al.  Regularized Max Pooling for Image Categorization , 2014, BMVC.

[25]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machines , 2002 .

[26]  Richard P. Wildes,et al.  Dynamically encoded actions based on spacetime saliency , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Andrew Zisserman,et al.  Improving Human Action Recognition Using Score Distribution and Ranking , 2014, ACCV.

[28]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[29]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[30]  Jitendra Malik,et al.  Finding action tubes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Liqing Zhang,et al.  Saliency Detection: A Spectral Residual Approach , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Wei Chen,et al.  Actionness Ranking with Lattice Conditional Ordinal Random Fields , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Fernando De la Torre,et al.  Joint segmentation and classification of human actions in video , 2011, CVPR 2011.

[34]  Limin Wang,et al.  Action recognition with trajectory-pooled deep-convolutional descriptors , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[36]  Bingbing Ni,et al.  Motion Part Regularization: Improving action recognition via trajectory group selection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Shuicheng Yan,et al.  STAP: Spatial-Temporal Attention-Aware Pooling for Action Recognition , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[38]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[39]  Andrew Zisserman,et al.  Action Recognition From Weak Alignment of Body Parts , 2014, BMVC.

[40]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[41]  Thomas Deselaers,et al.  Measuring the Objectness of Image Windows , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.