UKnow: A Unified Knowledge Protocol for Common-Sense Reasoning and Vision-Language Pre-training
暂无分享,去创建一个
Xiaohui Xie | Deli Zhao | Yiliang Lv | Yujun Shen | Yutong Feng | Biao Gong
[1] Nils Reimers,et al. MTEB: Massive Text Embedding Benchmark , 2022, EACL.
[2] Jeff Z. Pan,et al. Benchmarking knowledge-driven zero-shot learning , 2021, J. Web Semant..
[3] Ludwig Schmidt,et al. LAION-5B: An open large-scale dataset for training next generation image-text models , 2022, NeurIPS.
[4] Hongzhi Yin,et al. MMKGR: Multi-hop Multi-modal Knowledge Graph Reasoning , 2022, 2023 IEEE 39th International Conference on Data Engineering (ICDE).
[5] Yuxin Peng,et al. Learn from Unlabeled Videos for Near-duplicate Video Retrieval , 2022, SIGIR.
[6] Mohit Bansal,et al. Fine-grained Image Captioning with CLIP Reward , 2022, NAACL-HLT.
[7] Li Dong,et al. CLIP Models are Few-Shot Learners: Empirical Studies on VQA and Visual Entailment , 2022, ACL.
[8] J. Collomosse,et al. StyleBabel: Artistic Style Tagging and Captioning , 2022, ECCV.
[9] Trishul M. Chilimbi,et al. Vision-Language Pre-Training with Triple Contrastive Learning , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[10] Zhixu Li,et al. Multi-Modal Knowledge Graph Construction and Application: A Survey , 2022, IEEE Transactions on Knowledge and Data Engineering.
[11] Shuohang Wang,et al. CLIP-Event: Connecting Text and Images with Event Structures , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Yan Feng,et al. JointE: Jointly utilizing 1D and 2D convolution for knowledge graph embedding , 2022, Knowl. Based Syst..
[13] Liunian Harold Li,et al. Grounded Language-Image Pre-training , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Runwei Ding,et al. Contrastive Learning from Extremely Augmented Skeleton Sequences for Self-supervised Action Recognition , 2021, AAAI.
[15] Tongliang Liu,et al. CRIS: CLIP-Driven Referring Image Segmentation , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Zi-Yi Dou,et al. An Empirical Study of Training End-to-End Vision-and-Language Transformers , 2021, Computer Vision and Pattern Recognition.
[17] Junjie Yan,et al. Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm , 2021, ICLR.
[18] Zhe Gan,et al. An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA , 2021, AAAI.
[19] Bing Liu,et al. Zero-Shot Out-of-Distribution Detection Based on the Pre-trained Model CLIP , 2021, AAAI.
[20] Kurt Keutzer,et al. How Much Can CLIP Benefit Vision-and-Language Tasks? , 2021, ICLR.
[21] Li Dong,et al. BEiT: BERT Pre-Training of Image Transformers , 2021, ICLR.
[22] Ron Mokady,et al. ClipCap: CLIP Prefix for Image Captioning , 2021, ArXiv.
[23] T. Okatani,et al. Matching in the Dark: A Dataset for Matching Image Pairs of Low-light Scenes , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[24] Junnan Li,et al. Align before Fuse: Vision and Language Representation Learning with Momentum Distillation , 2021, NeurIPS.
[25] Jianlong Fu,et al. Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training , 2021, NeurIPS.
[26] Furu Wei,et al. Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word Alignment , 2021, ACL.
[27] Stephen Lin,et al. Aligning Pretraining for Detection via Object-Level Contrastive Learning , 2021, NeurIPS.
[28] Yann LeCun,et al. MDETR - Modulated Detection for End-to-End Multi-Modal Understanding , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[29] Jianlong Fu,et al. Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[31] Quoc V. Le,et al. Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision , 2021, ICML.
[32] Wonjae Kim,et al. ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision , 2021, ICML.
[33] Pieter Abbeel,et al. Bottleneck Transformers for Visual Recognition , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Huchuan Lu,et al. Similarity Reasoning and Filtration for Image-Text Matching , 2021, AAAI.
[35] Marcus Rohrbach,et al. KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Kyunghyun Cho,et al. VisualSem: a high-quality knowledge graph for vision and language , 2020, MRL.
[37] Ning Ding,et al. Modeling Relation Paths for Knowledge Graph Completion , 2020, IEEE Transactions on Knowledge and Data Engineering.
[38] Yang Zhao,et al. Deep High-Resolution Representation Learning for Visual Recognition , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[39] Yingli Tian,et al. Self-Supervised Visual Feature Learning With Deep Neural Networks: A Survey , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[40] Heng Ji,et al. RESIN: A Dockerized Schema-Guided Cross-document Cross-lingual Cross-media Information Extraction and Event Tracking System , 2021, NAACL.
[41] P. Merialdo,et al. Knowledge Graph Embedding for Link Prediction: A Comparative Analysis , 2021, ACM Trans. Knowl. Discov. Data.
[42] Guilin Qi,et al. Richpedia: A Large-Scale, Comprehensive Multi-Modal Knowledge Graph , 2020, Big Data Res..
[43] Feiliang Ren,et al. Knowledge Graph Embedding with Atrous Convolution and Residual Learning , 2020, COLING.
[44] Jure Leskovec,et al. Beta Embeddings for Multi-Hop Logical Reasoning in Knowledge Graphs , 2020, NeurIPS.
[45] Andrew Zisserman,et al. Self-supervised Co-training for Video Representation Learning , 2020, NeurIPS.
[46] Weifeng Zhang,et al. Cross-modal Knowledge Reasoning for Knowledge-based Visual Question Answering , 2020, Pattern Recognit..
[47] Ying Lin,et al. A Joint Neural Model for Information Extraction with Global Features , 2020, ACL.
[48] Iryna Gurevych,et al. Common Sense or World Knowledge? Investigating Adapter-Based Knowledge Injection into Pretrained Transformers , 2020, DEELIO.
[49] Weiming Dong,et al. Self-Supervised Feature Augmentation for Large Image Object Detection , 2020, IEEE Transactions on Image Processing.
[50] Wanxiang Che,et al. Recall and Learn: Fine-tuning Deep Pretrained Language Models with Less Forgetting , 2020, EMNLP.
[51] Jianfeng Gao,et al. Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks , 2020, ECCV.
[52] Jianlong Fu,et al. Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers , 2020, ArXiv.
[53] Li Dong,et al. MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers , 2020, NeurIPS.
[54] Jure Leskovec,et al. Query2box: Reasoning over Knowledge Graphs in Vector Space using Box Embeddings , 2020, ICLR.
[55] Yu Cheng,et al. UNITER: UNiversal Image-TExt Representation Learning , 2019, ECCV.
[56] Furu Wei,et al. VL-BERT: Pre-training of Generic Visual-Linguistic Representations , 2019, ICLR.
[57] André Susano Pinto,et al. A Large-scale Study of Representation Learning with the Visual Task Adaptation Benchmark , 2019, 1910.04867.
[58] Mohit Bansal,et al. LXMERT: Learning Cross-Modality Encoder Representations from Transformers , 2019, EMNLP.
[59] Cho-Jui Hsieh,et al. VisualBERT: A Simple and Performant Baseline for Vision and Language , 2019, ArXiv.
[60] Stefan Lee,et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.
[61] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[62] Daniel Zeng,et al. Multimodal Data Enhanced Representation Learning for Knowledge Graphs , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).
[63] Zhou Yu,et al. Deep Modular Co-Attention Networks for Visual Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[64] Ali Farhadi,et al. OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[65] Jongtack Kim,et al. Combination of Multiple Global Descriptors for Image Retrieval , 2019, ArXiv.
[66] David S. Rosenblum,et al. MMKG: Multi-Modal Knowledge Graphs , 2019, ESWC.
[67] Bowen Zhou,et al. End-to-end Structure-Aware Convolutional Networks for Knowledge Base Completion , 2018, AAAI.
[68] Jian-Yun Nie,et al. RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space , 2018, ICLR.
[69] Daniel Oñoro-Rubio,et al. Answering Visual-Relational Queries in Web-Extracted Knowledge Graphs , 2017, AKBC.
[70] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[71] Fenglong Ma,et al. EANN: Event Adversarial Neural Networks for Multi-Modal Fake News Detection , 2018, KDD.
[72] Jure Leskovec,et al. Embedding Logical Queries on Knowledge Graphs , 2018, NeurIPS.
[73] Iryna Gurevych,et al. A Multimodal Translation-Based Approach for Knowledge Graph Representation Learning , 2018, *SEMEVAL.
[74] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[75] Gregory Shakhnarovich,et al. Discriminability Objective for Training Descriptive Captions , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[76] Daisy Zhe Wang,et al. GAIA - A Multi-media Multi-lingual Knowledge Extraction and Hypothesis Generation System , 2018, TAC.
[77] Wenhan Xiong,et al. DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning , 2017, EMNLP.
[78] Jure Leskovec,et al. Inductive Representation Learning on Large Graphs , 2017, NIPS.
[79] Heiko Paulheim,et al. Knowledge graph refinement: A survey of approaches and evaluation methods , 2016, Semantic Web.
[80] Yash Goyal,et al. Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering , 2016, International Journal of Computer Vision.
[81] Huanbo Luan,et al. Image-embodied Knowledge Representation Learning , 2016, IJCAI.
[82] Max Welling,et al. Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.
[83] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[84] Gerhard Weikum,et al. YAGO: A Multilingual Knowledge Base from Wikipedia, Wordnet, and Geonames , 2016, SEMWEB.
[85] Licheng Yu,et al. Modeling Context in Referring Expressions , 2016, ECCV.
[86] Xiaogang Wang,et al. DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[87] Guillaume Bouchard,et al. Complex Embeddings for Simple Link Prediction , 2016, ICML.
[88] Gu-Yeon Wei,et al. Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[89] Trevor Darrell,et al. Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding , 2016, EMNLP.
[90] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.
[91] Benjamin Bustos,et al. IMGpedia: Enriching the Web of Data with Image Content Analysis , 2016, AMW.
[92] John Miller,et al. Traversing Knowledge Graphs in Vector Space , 2015, EMNLP.
[93] Vicente Ordonez,et al. ReferItGame: Referring to Objects in Photographs of Natural Scenes , 2014, EMNLP.
[94] Peter Young,et al. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.
[95] Jason Weston,et al. Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.
[96] Tom M. Mitchell,et al. Random Walk Inference and Learning in A Large Scale Knowledge Base , 2011, EMNLP.