CLIP-Event: Connecting Text and Images with Event Structures
暂无分享,去创建一个
Shuohang Wang | Heng Ji | Ruochen Xu | Shih-Fu Chang | Chenguang Zhu | Xudong Lin | Michael Zeng | Luowei Zhou | Manling Li
[1] Adams Wei Yu,et al. SimVLM: Simple Visual Language Model Pretraining with Weak Supervision , 2021, ICLR.
[2] Lisa Anne Hendricks,et al. Probing Image-Language Transformers for Verb Understanding , 2021, FINDINGS.
[3] Yejin Choi,et al. VinVL: Revisiting Visual Representations in Vision-Language Models , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Jianlong Fu,et al. Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[5] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[6] Quoc V. Le,et al. Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision , 2021, ICML.
[7] Wonjae Kim,et al. ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision , 2021, ICML.
[8] Hao Tian,et al. ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph , 2020, AAAI.
[9] Luciano Floridi,et al. GPT-3: Its Nature, Scope, Limits, and Consequences , 2020, Minds and Machines.
[10] Ying Lin,et al. A Joint Neural Model for Information Extraction with Global Features , 2020, ACL.
[11] Ying Lin,et al. GAIA: A Fine-grained Multimedia Knowledge Extraction System , 2020, ACL.
[12] Yu Cheng,et al. Graph Optimal Transport for Cross-Domain Alignment , 2020, ICML.
[13] Heng Ji,et al. Cross-media Structured Common Space for Multimedia Event Extraction , 2020, ACL.
[14] Yejin Choi,et al. VisualCOMET: Reasoning About the Dynamic Context of a Still Image , 2020, ECCV.
[15] Jianfeng Gao,et al. Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks , 2020, ECCV.
[16] Fahad Shahbaz Khan,et al. Learning Human-Object Interaction Detection Using Interaction Points , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Ali Farhadi,et al. Grounded Situation Recognition , 2020, ECCV.
[18] Wenguan Wang,et al. Cascaded Human-Object Interaction Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Shih-Fu Chang,et al. Weakly Supervised Visual Semantic Parsing , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[20] Yu Cheng,et al. UNITER: UNiversal Image-TExt Representation Learning , 2019, ECCV.
[21] Jason J. Corso,et al. Unified Vision-Language Pre-Training for Image Captioning and VQA , 2019, AAAI.
[22] J. Uijlings,et al. The Open Images Dataset V4 , 2018, International Journal of Computer Vision.
[23] Mohit Bansal,et al. LXMERT: Learning Cross-Modality Encoder Representations from Transformers , 2019, EMNLP.
[24] Stefan Lee,et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.
[25] Lawrence Carin,et al. Scalable Gromov-Wasserstein Learning for Graph Partitioning and Matching , 2019, NeurIPS.
[26] Ali Farhadi,et al. From Recognition to Cognition: Visual Commonsense Reasoning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Cewu Lu,et al. Transferable Interactiveness Knowledge for Human-Object Interaction Detection , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[28] Yin Li,et al. Compositional Learning for Human Object Interaction , 2018, ECCV.
[29] Frank Hutter,et al. Fixing Weight Decay Regularization in Adam , 2017, ArXiv.
[30] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[31] Sandro Pezzelle,et al. FOIL it! Find One mismatch between Image and Language caption , 2017, ACL.
[32] Michael S. Bernstein,et al. Visual Relationship Detection with Language Priors , 2016, ECCV.
[33] Ali Farhadi,et al. Situation Recognition: Visual Semantic Role Labeling for Image Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Jiaxuan Wang,et al. HICO: A Benchmark for Recognizing Human-Object Interactions in Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[35] Christopher D. Manning,et al. Leveraging Linguistic Structure For Open Domain Information Extraction , 2015, ACL.
[36] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[37] Jitendra Malik,et al. Visual Semantic Role Labeling , 2015, ArXiv.
[38] Mihai Surdeanu,et al. The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.
[39] Marco Cuturi,et al. Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.
[40] Chenliang Xu,et al. A Thousand Frames in Just a Few Words: Lingual Description of Videos through Latent Topics and Sparse Object Stitching , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[41] Oren Etzioni,et al. Open Language Learning for Information Extraction , 2012, EMNLP.
[42] Bart Selman,et al. Unstructured human activity detection from RGBD images , 2011, 2012 IEEE International Conference on Robotics and Automation.
[43] Mark Steedman,et al. Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning , 2012 .
[44] Fei-Fei Li,et al. Modeling mutual context of object and human pose in human-object interaction activities , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[45] James Ze Wang,et al. Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.
[46] George Loizou,et al. Computer vision and pattern recognition , 2007, Int. J. Comput. Math..
[47] Richard Sinkhorn. A Relationship Between Arbitrary Positive Matrices and Doubly Stochastic Matrices , 1964 .