Multi-Moments in Time: Learning and Interpreting Models for Multi-Action Video Understanding
暂无分享,去创建一个
Quanfu Fan | Aude Oliva | Dan Gutfreund | Bowen Pan | Rogerio Feris | Mathew Monfort | Kandan Ramakrishnan | Alex Andonian | Barry A. McNamara | Barry A McNamara | Alex Lascelles | A. Oliva | Mathew Monfort | R. Feris | Quanfu Fan | K. Ramakrishnan | Bowen Pan | Dan Gutfreund | A. Andonian | A. Lascelles
[1] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[2] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[3] Antonio Torralba,et al. LabelMe video: Building a video database with human annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.
[4] Bolei Zhou,et al. Temporal Relational Reasoning in Videos , 2017, ECCV.
[5] Sanja Fidler,et al. Detect What You Can: Detecting and Representing Objects Using Holistic Models and Body Parts , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[6] Fabio Viola,et al. The Kinetics Human Action Video Dataset , 2017, ArXiv.
[7] Berthold K. P. Horn,et al. Determining Optical Flow , 1981, Other Conferences.
[8] James A. Anderson,et al. Neurocomputing: Foundations of Research , 1988 .
[9] Zhi-Hua Zhou,et al. Multilabel Neural Networks with Applications to Functional Genomics and Text Categorization , 2006, IEEE Transactions on Knowledge and Data Engineering.
[10] Bolei Zhou,et al. Network Dissection: Quantifying Interpretability of Deep Visual Representations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[11] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Yang Wang,et al. Pulling Actions out of Context: Explicit Separation for Effective Combination , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[13] Bolei Zhou,et al. Interpreting Deep Visual Representations via Network Dissection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[14] Bolei Zhou,et al. Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[15] P. Alam. ‘L’ , 2021, Composites Engineering: An A–Z Guide.
[16] Bolei Zhou,et al. Moments in Time Dataset: One Million Videos for Event Understanding , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[17] Qi Wu,et al. Multilabel Image Classification With Regional Latent Semantic Dependencies , 2016, IEEE Transactions on Multimedia.
[18] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[19] Noah Snavely,et al. OpenSurfaces , 2013, ACM Trans. Graph..
[20] Antonio Torralba,et al. SoundNet: Learning Sound Representations from Unlabeled Video , 2016, NIPS.
[21] Yangqing Jia,et al. Deep Convolutional Ranking for Multilabel Image Annotation , 2013, ICLR.
[22] Mubarak Shah,et al. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.
[23] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Andrew Zisserman,et al. Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.
[25] Pin-Yu Chen,et al. Proper Network Interpretability Helps Adversarial Robustness in Classification , 2020, ICML.
[26] James L. McClelland,et al. Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .
[27] Bolei Zhou,et al. Semantic Understanding of Scenes Through the ADE20K Dataset , 2016, International Journal of Computer Vision.
[28] Wei Xu,et al. CNN-RNN: A Unified Framework for Multi-label Image Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[30] Li Fei-Fei,et al. Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos , 2015, International Journal of Computer Vision.
[31] Gang Wang,et al. Motion-Guided Cascaded Refinement Network for Video Object Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[32] Cordelia Schmid,et al. AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[33] Matthew Botvinick,et al. On the importance of single directions for generalization , 2018, ICLR.
[34] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.
[35] Тараса Шевченка,et al. Quo vadis? , 2013, Clinical chemistry.
[36] Jason Weston,et al. WSABIE: Scaling Up to Large Vocabulary Image Annotation , 2011, IJCAI.
[37] Yu Zhang,et al. Exploit Bounding Box Annotations for Multi-Label Object Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[38] Bolei Zhou,et al. GAN Dissection: Visualizing and Understanding Generative Adversarial Networks , 2018, ICLR.
[39] Jason Weston,et al. A kernel method for multi-labelled classification , 2001, NIPS.
[40] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[41] Dima Damen,et al. Scaling Egocentric Vision: The EPIC-KITCHENS Dataset , 2018, ArXiv.
[42] Chuang Gan,et al. TSM: Temporal Shift Module for Efficient Video Understanding , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[43] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[44] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[45] Susanne Westphal,et al. The “Something Something” Video Database for Learning and Evaluating Visual Common Sense , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[46] Sanja Fidler,et al. The Role of Context for Object Detection and Semantic Segmentation in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[47] Bjorn Ommer,et al. A Disentangling Invertible Interpretation Network for Explaining Latent Representations , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[48] Thomas Serre,et al. HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.
[49] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.
[50] G. Kane. Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol 1: Foundations, vol 2: Psychological and Biological Models , 1994 .
[51] Apostol Natsev,et al. YouTube-8M: A Large-Scale Video Classification Benchmark , 2016, ArXiv.
[52] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.
[53] Cordelia Schmid,et al. Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.
[54] Ali Farhadi,et al. Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding , 2016, ECCV.
[55] Henryk Sienkiewicz,et al. Quo Vadis? , 1967, American Association of Industrial Nurses journal.
[56] Yale Song,et al. Improving Pairwise Ranking for Multi-label Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[57] Bernard Ghanem,et al. ActivityNet: A large-scale video benchmark for human activity understanding , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).