EDIS: Entity-Driven Image Search over Multimodal Web Content
暂无分享,去创建一个
Siqi Liu | Wenhu Chen | W. Wang | Weixi Feng | Wenhu Chen
[1] Wittawat Jitkrittum,et al. A Sketch Is Worth a Thousand Words: Image Retrieval with Text and Sketch , 2022, European Conference on Computer Vision.
[2] Bryan A. Plummer,et al. NewsStories: Illustrating articles with visual summaries , 2022, ECCV.
[3] Conghui Hu,et al. Feature Representation Learning for Unsupervised Cross-domain Image Retrieval , 2022, ECCV.
[4] Mike Zheng Shou,et al. GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval , 2022, European Conference on Computer Vision.
[5] Errui Ding,et al. ViSTA: Vision and Scene Text Aggregation for Cross-Modal Retrieval , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Carl Vondrick,et al. There’s a Time and Place for Reasoning Beyond the Image , 2022, ACL.
[7] S. Hoi,et al. BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation , 2022, ICML.
[8] Tsu-Jui Fu,et al. VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling , 2021, ArXiv.
[9] Zi-Yi Dou,et al. An Empirical Study of Training End-to-End Vision-and-Language Transformers , 2021, Computer Vision and Pattern Recognition.
[10] Yonatan Bisk,et al. WebQA: Multihop and Multimodal QA , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[11] Adams Wei Yu,et al. SimVLM: Simple Visual Language Model Pretraining with Weak Supervision , 2021, ICLR.
[12] Junnan Li,et al. Align before Fuse: Vision and Language Representation Learning with Momentum Distillation , 2021, NeurIPS.
[13] Yejin Choi,et al. Misinfo Reaction Frames: Reasoning about Readers’ Reactions to News Headlines , 2021, ACL.
[14] Andrew Zisserman,et al. Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[15] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[16] Quoc V. Le,et al. Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision , 2021, ICML.
[17] Jordi Pont-Tuset,et al. Telling the What while Pointing to the Where: Multimodal Queries for Image Retrieval , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[18] Wonjae Kim,et al. ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision , 2021, ICML.
[19] Lei Zhang,et al. VinVL: Making Visual Representations Matter in Vision-Language Models , 2021, ArXiv.
[20] Vicente Ordonez,et al. Visual News: Benchmark and Challenges in News Image Captioning , 2020, EMNLP.
[21] Jason Baldridge,et al. Crisscrossed Captions: Extended Intramodal and Intermodal Semantic Similarity Judgments for MS-COCO , 2020, EACL.
[22] Lexing Xie,et al. Transform and Tell: Entity-Aware News Image Captioning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Jianfeng Gao,et al. Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks , 2020, ECCV.
[24] Danqi Chen,et al. Dense Passage Retrieval for Open-Domain Question Answering , 2020, EMNLP.
[25] Jordi Pont-Tuset,et al. Connecting Vision and Language with Localized Narratives , 2019, ECCV.
[26] Yu Cheng,et al. UNITER: UNiversal Image-TExt Representation Learning , 2019, ECCV.
[27] Nan Duan,et al. Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training , 2019, AAAI.
[28] Iryna Gurevych,et al. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.
[29] Cho-Jui Hsieh,et al. VisualBERT: A Simple and Performant Baseline for Vision and Language , 2019, ArXiv.
[30] Stefan Lee,et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.
[31] Svetlana Lazebnik,et al. Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models , 2015, International Journal of Computer Vision.
[32] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[33] Hugo Zaragoza,et al. The Probabilistic Relevance Framework: BM25 and Beyond , 2009, Found. Trends Inf. Retr..
[34] Xiang Ren,et al. NewsEdits: A News Article Revision Dataset and a Novel Document-Level Reasoning Challenge , 2022, North American Chapter of the Association for Computational Linguistics.
[35] Mark Dredze,et al. Updated Headline Generation: Creating Updated Summaries for Evolving News Stories , 2022, ACL.
[36] Natalie Schluter,et al. MassiveSumm: a very large-scale, very multilingual, news summarisation dataset , 2021, EMNLP.
[37] Martha Palmer,et al. NewsClaims: A New Benchmark for Claim Detection from News with Background Knowledge , 2021, ArXiv.