Long-Term Feature Banks for Detailed Video Understanding
暂无分享,去创建一个
Kaiming He | Ross B. Girshick | Christoph Feichtenhofer | Chao-Yuan Wu | Philipp Krähenbühl | Haoqi Fan | Kaiming He | Philipp Krähenbühl | Haoqi Fan | Chao-Yuan Wu | Christoph Feichtenhofer
[1] Henryk Sienkiewicz,et al. Quo Vadis? , 1967, American Association of Industrial Nurses journal.
[2] Y. L. Cun. Learning Process in an Asymmetric Threshold Network , 1986 .
[3] Hao Su,et al. Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.
[4] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.
[5] Trevor Darrell,et al. Detection bank: an object detection based video representation for multimedia event recognition , 2012, ACM Multimedia.
[6] Тараса Шевченка,et al. Quo vadis? , 2013, Clinical chemistry.
[7] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[8] Andrew Zisserman,et al. Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.
[9] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[10] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[11] Jitendra Malik,et al. Finding action tubes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Cordelia Schmid,et al. Learning to Track for Spatio-Temporal Action Localization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[13] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[14] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[15] Jason Weston,et al. End-To-End Memory Networks , 2015, NIPS.
[16] Matthew J. Hausknecht,et al. Beyond short snippets: Deep networks for video classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.
[18] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[19] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[20] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Ali Farhadi,et al. Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding , 2016, ECCV.
[22] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Richard P. Wildes,et al. Spatiotemporal Residual Networks for Video Action Recognition , 2016, NIPS.
[24] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[25] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.
[26] Cordelia Schmid,et al. Multi-region Two-Stream R-CNN for Action Detection , 2016, ECCV.
[27] Luc Van Gool,et al. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition , 2016, ECCV.
[28] Tao Mei,et al. Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[29] Rui Hou,et al. Tube Convolutional Neural Network (T-CNN) for Action Detection in Videos , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[30] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[31] Ivan Laptev,et al. Learnable pooling with Context Gating for video classification , 2017, ArXiv.
[32] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[33] Chen Sun,et al. Rethinking Spatiotemporal Feature Learning For Video Understanding , 2017, ArXiv.
[34] Cordelia Schmid,et al. Action Tubelet Detector for Spatio-Temporal Action Localization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[35] Xiao Liu,et al. Temporal Modeling Approaches for Large-scale Youtube-8M Video Understanding , 2017, ArXiv.
[36] Shih-Fu Chang,et al. ConvNet Architecture Search for Spatiotemporal Feature Learning , 2017, ArXiv.
[37] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[38] Kaiming He,et al. Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[39] Suman Saha,et al. Online Real-Time Multiple Spatiotemporal Action Localisation and Prediction , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[40] Ali Farhadi,et al. Asynchronous Temporal Fields for Action Recognition , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[41] Kaiming He,et al. Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[42] Lin Sun,et al. Lattice Long Short-Term Memory for Human Action Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[43] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[44] Suman Saha,et al. AMTnet: Action-Micro-Tube Regression by End-to-end Trainable Deep Architecture , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[45] Dima Damen,et al. Scaling Egocentric Vision: The Dataset , 2018, ECCV.
[46] Cordelia Schmid,et al. Long-Term Temporal Convolutions for Action Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[47] Christian Wolf,et al. Object Level Visual Reasoning in Videos , 2018, ECCV.
[48] Gang Yu,et al. Human Centric Spatio-Temporal Action Localization , 2018 .
[49] Xing Zhang,et al. Non-local NetVLAD Encoding for Video Classification , 2018, ECCV Workshops.
[50] Cordelia Schmid,et al. AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[51] Asim Kadav,et al. Attend and Interact: Higher-Order Object Interactions for Video Understanding , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[52] Cordelia Schmid,et al. Actor-Centric Relation Network , 2018, ECCV.
[53] Andrew Zisserman,et al. A Better Baseline for AVA , 2018, ArXiv.
[54] Cees Snoek,et al. VideoLSTM convolves, attends and flows for action recognition , 2016, Comput. Vis. Image Underst..
[55] Abhinav Gupta,et al. Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[56] Bolei Zhou,et al. Temporal Relational Reasoning in Videos , 2017, ECCV.
[57] Alexander J. Smola,et al. Compressed Video Action Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[58] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[59] Tao Mei,et al. Recurrent Tubelet Proposal and Recognition Networks for Action Detection , 2018, ECCV.
[60] Abhinav Gupta,et al. Videos as Space-Time Region Graphs , 2018, ECCV.
[61] Dima Damen,et al. Scaling Egocentric Vision: The EPIC-KITCHENS Dataset , 2018, ArXiv.