Search video action proposal with recurrent and static YOLO

In this paper, we propose a new approach for searching action proposals in unconstrained videos. Our method first produces snippet action proposals by combining state-of-the-art YOLO detector (Static YOLO) and our regression based RNN detector (Recurrent YOLO). Then, these short action proposals are integrated to form final action proposals by solving two-pass dynamic programming which maximizes actioness score and temporal smoothness concurrently. Our experimental comparison with other state-of-the-arts on challenging UCF101 dataset shows that our method advances state-of-the-art proposal generation performance while maintaining low computational cost.

[1]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Gang Wang,et al.  DAG-Recurrent Neural Networks for Scene Labeling , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Luc Van Gool,et al.  Actionness Estimation Using Hybrid Fully Convolutional Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Gang Wang,et al.  Learning Contextual Dependence With Convolutional Hierarchical Recurrent Neural Networks , 2015, IEEE Transactions on Image Processing.

[5]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Cordelia Schmid,et al.  Spatio-temporal Object Detection Proposals , 2014, ECCV.

[7]  Kavita Bala,et al.  Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[9]  Jitendra Malik,et al.  Object Segmentation by Long Term Analysis of Point Trajectories , 2010, ECCV.

[10]  Gang Yu,et al.  Fast action proposals for human action detection and search , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[12]  Ming Shao,et al.  A Multi-stream Bi-directional Recurrent Neural Network for Fine-Grained Action Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Suman Saha,et al.  Deep Learning for Detecting Multiple Space-Time Action Tubes in Videos , 2016, BMVC.

[14]  Nannan Li,et al.  Searching Action Proposals via Spatial Actionness Estimation and Temporal Path Inference and Tracking , 2016, ACCV.

[15]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[16]  Cees Snoek,et al.  APT: Action localization proposals from dense trajectories , 2015, BMVC.