Action Genome: Actions As Compositions of Spatio-Temporal Scene Graphs
暂无分享,去创建一个
Juan Carlos Niebles | Fei-Fei Li | Ranjay Krishna | Jingwei Ji | Li Fei-Fei | Ranjay Krishna | Jingwei Ji
[1] R. Barker,et al. One boy's day : a specimen record of behavior , 1951 .
[2] C. Stendler. One Boy's Day: A Specimen Record of Behavior. , 1952 .
[3] R. Barker,et al. Midwest and its children: the psychological ecology of an American town. , 1954 .
[4] A. Michotte. The perception of causality , 1963 .
[5] Darren Newtson. Attribution and the unit of perception of ongoing behavior. , 1973 .
[6] G. Miller,et al. Language and Perception , 1976 .
[7] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.
[8] J. Skorupski. The international research library of philosophy , 1993 .
[9] Jeffrey M. Zacks,et al. Perceiving, remembering, and communicating structure in events. , 2001, Journal of experimental psychology. General.
[10] Jeffrey M. Zacks,et al. Human brain activity time-locked to perceptual event boundaries , 2001, Nature Neuroscience.
[11] Pietro Perona,et al. A Bayesian approach to unsupervised one-shot learning of object categories , 2003, ICCV 2003.
[12] Pietro Perona,et al. A Bayesian approach to unsupervised one-shot learning of object categories , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.
[13] Aron Culotta,et al. Dependency Tree Kernels for Relation Extraction , 2004, ACL.
[14] Jian Su,et al. Exploring Various Knowledge in Relation Extraction , 2005, ACL.
[15] B. Tversky,et al. Making sense of abstract events: Building event schemas , 2006, Memory & cognition.
[16] Pietro Perona,et al. One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[17] Benjamin Z. Yao,et al. Introduction to a Large-Scale General Purpose Ground Truth Database: Methodology, Annotation Tool and Benchmarks , 2007, EMMCVPR.
[18] Guodong Zhou,et al. Tree Kernel-Based Relation Extraction with Context-Sensitive Structured Parse Tree Information , 2007, EMNLP.
[19] Jeffrey M. Zacks,et al. A Computational Model of Event Segmentation From Perceptual Prediction , 2007, Cogn. Sci..
[20] Jeffrey M. Zacks,et al. Segmentation in the perception and memory of events , 2008, Trends in Cognitive Sciences.
[21] Charless C. Fowlkes,et al. Discriminative Models for Multi-Class Object Layout , 2009, 2009 IEEE 12th International Conference on Computer Vision.
[22] Ali Farhadi,et al. Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[23] Zhuowen Tu,et al. Auto-Context and Its Application to High-Level Vision Tasks and 3D Brain Image Segmentation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[24] Hao Su,et al. Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.
[25] Kristen Grauman,et al. Relative attributes , 2011, 2011 International Conference on Computer Vision.
[26] Tal Hassner,et al. One Shot Similarity Metric Learning for Action Recognition , 2011, SIMBAD.
[27] Vladlen Koltun,et al. Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.
[28] Trevor Darrell,et al. Detection bank: an object detection based video representation for multimedia event recognition , 2012, ACM Multimedia.
[29] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[30] Andrew Zisserman,et al. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.
[31] Juan Carlos Niebles,et al. Discriminative Hierarchical Modeling of Spatio-temporally Composable Human Activities , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[32] Bernard Ghanem,et al. ActivityNet: A large-scale video benchmark for human activity understanding , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[33] Li Fei-Fei,et al. Generating Semantically Precise Scene Graphs from Textual Descriptions for Improved Image Retrieval , 2015, VL@EMNLP.
[34] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[35] Matthew J. Hausknecht,et al. Beyond short snippets: Deep networks for video classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Michael S. Bernstein,et al. Image retrieval using scene graphs , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[37] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[38] Cordelia Schmid,et al. Towards Weakly-Supervised Action Localization , 2016, ArXiv.
[39] Li Fei-Fei,et al. End-to-End Learning of Action Detection from Frame Glimpses in Videos , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[40] Basura Fernando,et al. SPICE: Semantic Propositional Image Caption Evaluation , 2016, ECCV.
[41] Ali Farhadi,et al. Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding , 2016, ECCV.
[42] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[43] Silvio Savarese,et al. Structural-RNN: Deep Learning on Spatio-Temporal Graphs , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[44] Michael S. Bernstein,et al. Visual Relationship Detection with Language Priors , 2016, ECCV.
[45] Andrea Vedaldi,et al. Salient Deconvolutional Networks , 2016, ECCV.
[46] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.
[47] Bernard Ghanem,et al. DAPs: Deep Action Proposals for Action Understanding , 2016, ECCV.
[48] Luc Van Gool,et al. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition , 2016, ECCV.
[49] Li Fei-Fei,et al. Inferring and Executing Programs for Visual Reasoning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[50] Danfei Xu,et al. Scene Graph Generation by Iterative Message Passing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[51] Cordelia Schmid,et al. Human Action Localization with Sparse Spatial Supervision , 2017 .
[52] Xiaogang Wang,et al. ViP-CNN: Visual Phrase Guided Convolutional Neural Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[53] Bo Dai,et al. Detecting Visual Relationships with Deep Relational Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[54] Eric P. Xing,et al. Deep Variation-Structured Reinforcement Learning for Visual Relationship and Attribute Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[55] Jia Deng,et al. Pixels to Graphs by Associative Embedding , 2017, NIPS.
[56] Fabio Viola,et al. The Kinetics Human Action Video Dataset , 2017, ArXiv.
[57] Juan Carlos Niebles,et al. Dense-Captioning Events in Videos , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[58] Li Fei-Fei,et al. Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos , 2015, International Journal of Computer Vision.
[59] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[60] Xiaogang Wang,et al. Scene Graph Generation from Objects, Phrases and Region Captions , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[61] Michael S. Bernstein,et al. Referring Relationships , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[62] Yejin Choi,et al. Neural Motifs: Scene Graph Parsing with Global Context , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[63] Stefan Lee,et al. Graph R-CNN for Scene Graph Generation , 2018, ECCV.
[64] Yi Yang,et al. Compound Memory Networks for Few-Shot Video Classification , 2018, ECCV.
[65] Piyush Rai,et al. A Generative Approach to Zero-Shot and Few-Shot Action Recognition , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).
[66] Li Fei-Fei,et al. Image Generation from Scene Graphs , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[67] Cordelia Schmid,et al. Long-Term Temporal Convolutions for Action Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[68] Christian Wolf,et al. Object Level Visual Reasoning in Videos , 2018, ECCV.
[69] Gang Yu,et al. Human Centric Spatio-Temporal Action Localization , 2018 .
[70] Cordelia Schmid,et al. AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[71] Asim Kadav,et al. Attend and Interact: Higher-Order Object Interactions for Video Understanding , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[72] Cordelia Schmid,et al. Actor-Centric Relation Network , 2018, ECCV.
[73] Andrew Zisserman,et al. A Better Baseline for AVA , 2018, ArXiv.
[74] Abhinav Gupta,et al. Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[75] Bolei Zhou,et al. Temporal Relational Reasoning in Videos , 2017, ECCV.
[76] Xiaogang Wang,et al. Factorizable Net: An Efficient Subgraph-based Framework for Scene Graph Generation , 2018, ECCV.
[77] Abhinav Gupta,et al. Videos as Space-Time Region Graphs , 2018, ECCV.
[78] Jonathan Berant,et al. Mapping Images to Scene Graphs with Permutation-Invariant Structured Prediction , 2018, NeurIPS.
[79] Dima Damen,et al. Scaling Egocentric Vision: The EPIC-KITCHENS Dataset , 2018, ArXiv.
[80] Ji Zhang,et al. Graphical Contrastive Losses for Scene Graph Parsing , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[81] Shuaib Ahmed,et al. ProtoGAN: Towards Few Shot Learning for Action Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).
[82] Michael S. Bernstein,et al. Scene Graph Prediction with Limited Labels , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).
[83] Kaiming He,et al. Long-Term Feature Banks for Detailed Video Understanding , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[84] Chuang Gan,et al. TSM: Temporal Shift Module for Efficient Video Understanding , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[85] Arnold W. M. Smeulders,et al. Timeception for Complex Action Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[86] Hang Zhao,et al. HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization , 2017, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[87] Yongkang Wong,et al. Explainable Video Action Reasoning via Prior Knowledge and State Transitions , 2019, ACM Multimedia.
[88] Andrew Zisserman,et al. A Short Note on the Kinetics-700 Human Action Dataset , 2019, ArXiv.
[89] Jitendra Malik,et al. SlowFast Networks for Video Recognition , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[90] Andrew Zisserman,et al. Video Action Transformer Network , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[91] Michael S. Bernstein,et al. Visual Relationships as Functions:Enabling Few-Shot Scene Graph Prediction , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).
[92] Oron Ashual,et al. Specifying Object Attributes and Relations in Interactive Scene Generation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[93] Ioannis Patras,et al. TARN: Temporal Attentive Relation Network for Few-Shot and Zero-Shot Action Recognition , 2019, BMVC.
[94] Abhishek Das,et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).