论文信息 - Vilio: State-of-the-art Visio-Linguistic Models applied to Hateful Memes

Vilio: State-of-the-art Visio-Linguistic Models applied to Hateful Memes

This work presents Vilio, an implementation of state-of-the-art visio-linguistic models and their application to the Hateful Memes Dataset. The implemented models have been fitted into a uniform code-base and altered to yield better performance. The goal of Vilio is to provide a user-friendly starting point for any visio-linguistic problem. An ensemble of 5 different V+L models implemented in Vilio achieves 2nd place in the Hateful Memes Challenge out of 3,300 participants. The code is available at this https URL.

Niklas Muennighoff | Niklas Muennighoff

[1] Stefan Lee,et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.

[2] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ArXiv.

[3] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4] Jianfeng Gao,et al. Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks , 2020, ECCV.

[5] Licheng Yu,et al. UNITER: Learning UNiversal Image-TExt Representations , 2019, ArXiv.

[6] Linet Özdamar,et al. TRIOPT: a triangulation-based partitioning algorithm for global optimization , 2005 .

[7] Zhou Zhao,et al. DeVLBert: Learning Deconfounded Visio-Linguistic Representations , 2020, ACM Multimedia.

[8] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[9] Hao Tian,et al. ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph , 2020, ArXiv.

[10] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.

[11] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.