GSRFormer: Grounded Situation Recognition Transformer with Alternate Semantic Attention Refinement
暂无分享,去创建一个
[1] Rita Cucchiara,et al. From Show to Tell: A Survey on Deep Learning-Based Image Captioning , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[2] Suha Kwak,et al. Collaborative Transformers for Grounded Situation Recognition , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[3] Xiangnan He,et al. Group Contextualization for Video Recognition , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Angela Yao,et al. Video as Conditional Graph Hierarchy for Multi-Granular Question Answering , 2021, AAAI.
[5] Tat-Seng Chua,et al. Rethinking the Two-Stage Framework for Grounded Situation Recognition , 2021, AAAI.
[6] Suha Kwak,et al. Grounded Situation Recognition with Transformers , 2021, BMVC.
[7] Chong-Wah Ngo,et al. Token Shift Transformer for Video Classification , 2021, ACM Multimedia.
[8] Bodo Rosenhahn,et al. Spatial-Temporal Transformer for Dynamic Scene Graph Generation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[9] Olga Russakovsky,et al. Understanding and Evaluating Racial Biases in Image Captioning , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[10] Lu Yuan,et al. Dynamic Head: Unifying Object Detection Heads with Attentions , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[11] Andrea Bacciu,et al. Unifying Cross-Lingual Semantic Role Labeling with Heterogeneous Linguistic Resources , 2021, NAACL.
[12] Qi Wu,et al. Towards Accurate Text-based Image Captioning with Content Diversity Exploration , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Arka Sadhu,et al. Visual Semantic Role Labeling for Video Understanding , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Cho-Jui Hsieh,et al. Robust and Accurate Object Detection via Adversarial Learning , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Shuohang Wang,et al. LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time Image-Text Retrieval , 2021, NAACL.
[16] Matthieu Cord,et al. Training data-efficient image transformers & distillation through attention , 2020, ICML.
[17] Yongjian Wu,et al. Improving Image Captioning by Leveraging Intra- and Inter-layer Global Representation in Transformer Network , 2020, AAAI.
[18] Xiao Wu,et al. DB-LSTM: Densely-connected Bi-directional LSTM for human action recognition , 2020, Neurocomputing.
[19] S. Gelly,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.
[20] Bin Li,et al. Deformable DETR: Deformable Transformers for End-to-End Object Detection , 2020, ICLR.
[21] Stephen Lin,et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[22] Chun Yuan,et al. HOSE-Net: Higher Order Structure Embedded Network for Scene Graph Generation , 2020, ACM Multimedia.
[23] Ngai-Man Cheung,et al. Attention-Based Context Aware Reasoning for Situation Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Nicolas Usunier,et al. End-to-End Object Detection with Transformers , 2020, ECCV.
[25] Bolei Zhou,et al. Temporal Pyramid Network for Action Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[26] Ali Farhadi,et al. Grounded Situation Recognition , 2020, ECCV.
[27] Shiliang Pu,et al. Counterfactual Samples Synthesizing for Robust Visual Question Answering , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[28] Jiashi Feng,et al. PPDM: Parallel Point Detection and Matching for Real-Time Human-Object Interaction Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Marcella Cornia,et al. Meshed-Memory Transformer for Image Captioning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Quoc V. Le,et al. EfficientDet: Scalable and Efficient Object Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Leonid Sigal,et al. Mixture-Kernel Graph Attention Network for Situation Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[32] Ilya Sutskever,et al. Generating Long Sequences with Sparse Transformers , 2019, ArXiv.
[33] Yang Wang,et al. Cross-Modal Self-Attention Network for Referring Image Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Yu Cheng,et al. Relation-Aware Graph Attention Network for Visual Question Answering , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[35] Silvio Savarese,et al. Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Matthieu Cord,et al. MUREL: Multimodal Relational Reasoning for Visual Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[37] Long Chen,et al. Counterfactual Critic Multi-Agent Training for Scene Graph Generation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[38] Xiao Wu,et al. Learning to Transfer: Generalizable Attribute Learning with Multitask Neural Model Search , 2018, ACM Multimedia.
[39] Stefan Lee,et al. Graph R-CNN for Scene Graph Generation , 2018, ECCV.
[40] Andrew McCallum,et al. Linguistically-Informed Self-Attention for Semantic Role Labeling , 2018, EMNLP.
[41] Xi Li,et al. GNAS: A Greedy Neural Architecture Search Method for Multi-Attribute Learning , 2018, ACM Multimedia.
[42] Dustin Tran,et al. Image Transformer , 2018, ICML.
[43] Sanja Fidler,et al. MovieGraphs: Towards Understanding Human-Centric Situations from Videos , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[44] Yidong Chen,et al. Deep Semantic Role Labeling with Self-Attention , 2017, AAAI.
[45] Yann LeCun,et al. A Closer Look at Spatiotemporal Convolutions for Action Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[46] Yejin Choi,et al. Neural Motifs: Scene Graph Parsing with Global Context , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[47] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[48] Xiao Wu,et al. Personalized clothing recommendation combining user social circle and fashion style consistency , 2017, Multimedia Tools and Applications.
[49] Sanja Fidler,et al. Situation Recognition with Graph Neural Networks , 2018 .
[50] Xiaogang Wang,et al. Scene Graph Generation from Objects, Phrases and Region Captions , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[51] Yang Liu,et al. Video2Shop: Exact Matching Clothes in Videos to Online Shopping Images , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[52] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[53] Chong-Wah Ngo,et al. On the Selection of Anchors and Targets for Video Hyperlinking , 2017, ICMR.
[54] Yang Liu,et al. Video eCommerce++: Toward Large Scale Online Video Advertising , 2017, IEEE Transactions on Multimedia.
[55] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[56] Samuel S. Schoenholz,et al. Neural Message Passing for Quantum Chemistry , 2017, ICML.
[57] Svetlana Lazebnik,et al. Recurrent Models for Situation Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[58] Danfei Xu,et al. Scene Graph Generation by Iterative Message Passing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[59] Bohyung Han,et al. Large-Scale Image Retrieval with Attentive Deep Local Features , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[60] Huimin Ma,et al. Single Image Action Recognition Using Semantic Body Part Actions , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[61] Kaiming He,et al. Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[62] Ali Farhadi,et al. Commonly Uncommon: Semantic Sparsity in Situation Recognition , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[63] Vaibhava Goel,et al. Self-Critical Sequence Training for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[64] Tat-Seng Chua,et al. SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[65] Max Welling,et al. Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.
[66] Qing Li,et al. VIREO @ TRECVID 2017: Video-to-Text, Ad-hoc Video Search, and Video hyperlinking , 2017, TRECVID.
[67] Yang Liu,et al. Video eCommerce: Towards Online Video Advertising , 2016, ACM Multimedia.
[68] Tao Chen,et al. Context-aware Image Tweet Modelling and Recommendation , 2016, ACM Multimedia.
[69] Ali Farhadi,et al. Situation Recognition: Visual Semantic Role Labeling for Image Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[70] Jiebo Luo,et al. Image Captioning with Semantic Attention , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[71] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[72] Richard S. Zemel,et al. Gated Graph Sequence Neural Networks , 2015, ICLR.
[73] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[74] Cordelia Schmid,et al. Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.
[75] Cícero Nogueira dos Santos,et al. Semantic Role Labeling , 2012 .
[76] Christopher R. Johnson,et al. Background to Framenet , 2003 .
[77] Martha Palmer,et al. From TreeBank to PropBank , 2002, LREC.
[78] Collin F. Baker,et al. Frame semantics for text understanding , 2001 .
[79] John B. Lowe,et al. The Berkeley FrameNet Project , 1998, ACL.