SlowFast Networks for Video Recognition
暂无分享,去创建一个
Jitendra Malik | Kaiming He | Christoph Feichtenhofer | Haoqi Fan | Kaiming He | Jitendra Malik | Haoqi Fan | Christoph Feichtenhofer
[1] P. Lennie,et al. Spatial and temporal contrast sensitivities of neurones in lateral geniculate nucleus of macaque. , 1984, The Journal of physiology.
[2] E H Adelson,et al. Spatiotemporal energy models for the perception of motion. , 1985, Journal of the Optical Society of America. A, Optics and image science.
[3] D. Hubel,et al. Segregation of form, color, movement, and depth: anatomy, physiology, and perception. , 1988, Science.
[4] D. J. Felleman,et al. Distributed hierarchical processing in the primate cerebral cortex. , 1991, Cerebral cortex.
[5] D. Ruderman. The statistics of natural images , 1994 .
[6] D. V. Essen,et al. Neural mechanisms of form and motion processing in the primate visual system , 1994, Neuron.
[7] David Mumford,et al. Statistics of natural images and models , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).
[8] Edward H. Adelson,et al. Motion illusions as optimal percepts , 2002, Nature Neuroscience.
[9] Serge J. Belongie,et al. Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.
[10] Cordelia Schmid,et al. Human Detection Using Oriented Histograms of Flow and Appearance , 2006, ECCV.
[11] Cordelia Schmid,et al. Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.
[12] Cordelia Schmid,et al. A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.
[13] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[14] Yann LeCun,et al. Convolutional Learning of Spatio-temporal Features , 2010, ECCV.
[15] Nitish Srivastava,et al. Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.
[16] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[17] Cordelia Schmid,et al. Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.
[18] Andrew Zisserman,et al. Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.
[19] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[20] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[21] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[22] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.
[23] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[24] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[25] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[26] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[27] Ali Farhadi,et al. Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding , 2016, ECCV.
[28] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Richard P. Wildes,et al. Spatiotemporal Residual Networks for Video Action Recognition , 2016, NIPS.
[30] Andrew Zisserman,et al. Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Luc Van Gool,et al. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition , 2016, ECCV.
[32] Tao Mei,et al. Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[33] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[34] Chen Sun,et al. Rethinking Spatiotemporal Feature Learning For Video Understanding , 2017, ArXiv.
[35] Fabio Viola,et al. The Kinetics Human Action Video Dataset , 2017, ArXiv.
[36] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.
[37] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[38] Kaiming He,et al. Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[39] Ali Farhadi,et al. Asynchronous Temporal Fields for Action Recognition , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[40] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[41] Fu Li,et al. Exploiting Spatial-Temporal Modelling and Multi-Modal Fusion for Human Action Recognition , 2018, ArXiv.
[42] Cordelia Schmid,et al. Long-Term Temporal Convolutions for Action Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[43] Gang Yu,et al. Human Centric Spatio-Temporal Action Localization , 2018 .
[44] Bernard Ghanem,et al. The ActivityNet Large-Scale Activity Recognition Challenge 2018 Summary , 2018, ArXiv.
[45] Cordelia Schmid,et al. AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[46] Andrew Zisserman,et al. A Short Note about Kinetics-600 , 2018, ArXiv.
[47] Cordelia Schmid,et al. Actor-Centric Relation Network , 2018, ECCV.
[48] Andrew Zisserman,et al. A Better Baseline for AVA , 2018, ArXiv.
[49] Luc Van Gool,et al. Spatio-Temporal Channel Correlation Networks for Action Classification , 2018, ECCV.
[50] Abhinav Gupta,et al. Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[51] Yann LeCun,et al. A Closer Look at Spatiotemporal Convolutions for Action Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[52] Bolei Zhou,et al. Temporal Relational Reasoning in Videos , 2017, ECCV.
[53] Alexander J. Smola,et al. Compressed Video Action Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[54] Thomas Brox,et al. ECO: Efficient Convolutional Network for Online Video Understanding , 2018, ECCV.
[55] Abhinav Gupta,et al. Videos as Space-Time Region Graphs , 2018, ECCV.
[56] Limin Wang,et al. Appearance-and-Relation Networks for Video Classification , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[57] Kaiming He,et al. Long-Term Feature Banks for Detailed Video Understanding , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[58] Andrew Zisserman,et al. A Short Note on the Kinetics-700 Human Action Dataset , 2019, ArXiv.
[59] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.