RelViT: Concept-guided Vision Transformer for Visual Relational Reasoning
暂无分享,去创建一个
Song-Chun Zhu | Anima Anandkumar | Yuke Zhu | Zhiding Yu | Chaowei Xiao | Weili Nie | Xiaojian Ma | Huaizu Jiang | Yuke Zhu
[1] Anima Anandkumar,et al. Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[2] PVTv2: Improved Baselines with Pyramid Vision Transformer , 2021, Computational Visual Media.
[3] Chunyuan Li,et al. Efficient Self-supervised Vision Transformers for Representation Learning , 2021, ICLR.
[4] Alexander Kolesnikov,et al. Scaling Vision Transformers , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[5] Song-Chun Zhu,et al. Unsupervised Foreground Extraction via Deep Region Competition , 2021, NeurIPS.
[6] Jenq-Neng Hwang,et al. Is Object Detection Necessary for Human-Object Interaction Recognition? , 2021, ArXiv.
[7] Julien Mairal,et al. Emerging Properties in Self-Supervised Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[8] Saining Xie,et al. An Empirical Study of Training Self-Supervised Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[9] Xiang Li,et al. Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[10] Song-Chun Zhu,et al. HALMA: Humanlike Abstraction Learning Meets Affordance in Rapid Problem Solving , 2021, ArXiv.
[11] Xinlei Chen,et al. Exploring Simple Siamese Representation Learning , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Tao Kong,et al. Dense Contrastive Learning for Self-Supervised Visual Pre-Training , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[13] S. Gelly,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.
[14] Stephen Clark,et al. Grounded Language Learning Fast and Slow , 2020, ICLR.
[15] Stephen Lin,et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[16] Dilin Wang,et al. Improve Vision Transformers Training by Suppressing Over-smoothing , 2021, ArXiv.
[17] Felix Hill,et al. Object-based attention for spatio-temporal reasoning: Outperforming neuro-symbolic models with flexible distributed architectures , 2020, ArXiv.
[18] Ankit B. Patel,et al. Bongard-LOGO: A New Benchmark for Human-Level Concept Learning and Reasoning , 2020, NeurIPS.
[19] Y. Qiao,et al. Visual Compositional Learning for Human-Object Interaction Detection , 2020, ECCV.
[20] Been Kim,et al. Concept Bottleneck Models , 2020, ICML.
[21] Thomas Kipf,et al. Object-Centric Learning with Slot Attention , 2020, NeurIPS.
[22] Jianfeng Gao,et al. Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks , 2020, ECCV.
[23] Cewu Lu,et al. PaStaNet: Toward Human Activity Knowledge Engine , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[24] B. Lake,et al. A Benchmark for Systematic Generalization in Grounded Language Understanding , 2020, NeurIPS.
[25] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.
[26] Xinlei Chen,et al. In Defense of Grid Features for Visual Question Answering , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Sungjin Ahn,et al. SPACE: Unsupervised Object-Oriented Scene Representation via Spatial Attention and Decomposition , 2020, ICLR.
[28] Ross B. Girshick,et al. Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Mathijs Mul,et al. Compositionality Decomposed: How do Neural Networks Generalise? , 2019, J. Artif. Intell. Res..
[30] Anton van den Hengel,et al. V-PROM: A Benchmark for Visual Reasoning Using Visual Progressive Matrices , 2019, AAAI.
[31] Phillip Isola,et al. Contrastive Multiview Coding , 2019, ECCV.
[32] Mohit Bansal,et al. LXMERT: Learning Cross-Modality Encoder Representations from Transformers , 2019, EMNLP.
[33] Cho-Jui Hsieh,et al. VisualBERT: A Simple and Performant Baseline for Vision and Language , 2019, ArXiv.
[34] Zhou Yu,et al. Deep Modular Co-Attention Networks for Visual Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Feng Gao,et al. RAVEN: A Dataset for Relational and Analogical Visual REasoNing , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Klaus Greff,et al. Multi-Object Representation Learning with Iterative Variational Inference , 2019, ICML.
[37] Christopher D. Manning,et al. GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[38] Bernhard Schölkopf,et al. Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations , 2018, ICML.
[39] Aaron C. Courville,et al. Systematic Generalization: What Is Required and Can It Be Learned? , 2018, ICLR.
[40] Chuang Gan,et al. Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding , 2018, NeurIPS.
[41] Cewu Lu,et al. Pairwise Body-Part Attention for Recognizing Human-Object Interactions , 2018, ECCV.
[42] Matthijs Douze,et al. Deep Clustering for Unsupervised Learning of Visual Features , 2018, ECCV.
[43] Oriol Vinyals,et al. Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.
[44] Felix Hill,et al. Measuring abstract reasoning in neural networks , 2018, ICML.
[45] Daniel L. K. Yamins,et al. Flexible Neural Representation for Physics Prediction , 2018, NeurIPS.
[46] Li Fei-Fei,et al. Scaling Human-Object Interaction Recognition Through Zero-Shot Learning , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).
[47] Andriy Mnih,et al. Disentangling by Factorising , 2018, ICML.
[48] Christopher D. Manning,et al. Compositional Attention Networks for Machine Reasoning , 2018, ICLR.
[49] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[50] Deva Ramanan,et al. Attentional Pooling for Action Recognition , 2017, NIPS.
[51] Abhinav Gupta,et al. Transitive Invariance for Self-Supervised Visual Representation Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[52] Alexander Kuhnle,et al. ShapeWorld - A new test methodology for multimodal language understanding , 2017, ArXiv.
[53] Li Fei-Fei,et al. CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[54] Christopher Burgess,et al. beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.
[55] Svetlana Lazebnik,et al. Learning Models for Actions and Person-Object Interactions with Transfer to Question Answering , 2016, ECCV.
[56] Geoffrey E. Hinton,et al. Attend, Infer, Repeat: Fast Scene Understanding with Generative Models , 2016, NIPS.
[57] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.
[58] Jiaxuan Wang,et al. HICO: A Benchmark for Recognizing Human-Object Interactions in Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[59] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[60] Ali Farhadi,et al. Phrasal Recognition , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[61] Antonio Torralba,et al. SIFT Flow: Dense Correspondence across Scenes and Its Applications , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.