MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering
暂无分享,去创建一个
Julian Martin Eisenschlos | Kenton Lee | Y. Altun | Fangyu Liu | Mandar Joshi | Francesco Piccinno | Syrine Krichene | Chenxi Pang | Nigel Collier
[1] Julian Martin Eisenschlos,et al. DePlot: One-shot visual language reasoning by plot-to-table translation , 2022, ACL.
[2] William W. Cohen,et al. Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks , 2022, ArXiv.
[3] Jamie Callan,et al. PAL: Program-aided Language Models , 2022, ICML.
[4] Julian Martin Eisenschlos,et al. Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding , 2022, ArXiv.
[5] Dragomir R. Radev,et al. Binding Language Models in Symbolic Languages , 2022, ICLR.
[6] Ashish V. Thapliyal,et al. PaLI: A Jointly-Scaled Multilingual Language-Image Model , 2022, ICLR.
[7] Radu Soricut,et al. PreSTU: Pre-Training for Scene-Text Understanding , 2022, ArXiv.
[8] Yuhuai Wu,et al. Insights into Pre-training via Simpler Synthetic Tasks , 2022, NeurIPS.
[9] Furu Wei,et al. LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking , 2022, ACM Multimedia.
[10] Vlad I. Morariu,et al. End-to-end Document Recognition and Understanding with Dessurt , 2022, ECCV Workshops.
[11] Shafiq R. Joty,et al. ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning , 2022, FINDINGS.
[12] Shafiq R. Joty,et al. Chart-to-Text: A Large-Scale Benchmark for Chart Summarization , 2022, ACL.
[13] Dale Schuurmans,et al. Chain of Thought Prompting Elicits Reasoning in Large Language Models , 2022, NeurIPS.
[14] Weizhu Chen,et al. Reasoning Like Program Executors , 2022, EMNLP.
[15] Dongyoon Han,et al. OCR-Free Document Understanding Transformer , 2021, ECCV.
[16] Dani Lischinski,et al. Classification-Regression for Chart Comprehension , 2021, ECCV.
[17] Nigel Collier,et al. Visually Grounded Reasoning across Languages and Cultures , 2021, EMNLP.
[18] Jaemin Cho,et al. Unifying Vision-and-Language Tasks via Text Generation , 2021, ICML.
[19] S. Gelly,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.
[20] Thomas Muller,et al. Understanding tables with intermediate pre-training , 2020, FINDINGS.
[21] Thomas Muller,et al. TaPas: Weakly Supervised Table Parsing via Pre-training , 2020, ACL.
[22] Jonathan Berant,et al. Injecting Numerical Reasoning Skills into Language Models , 2020, ACL.
[23] Furu Wei,et al. LayoutLM: Pre-training of Text and Layout for Document Image Understanding , 2019, KDD.
[24] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[25] Mitesh M. Khapra,et al. PlotQA: Reasoning over Scientific Plots , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).
[26] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[27] Pushmeet Kohli,et al. Analysing Mathematical Reasoning Abilities of Neural Models , 2019, ICLR.
[28] Gabriel Stanovsky,et al. DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs , 2019, NAACL.
[29] Yoav Artzi,et al. A Corpus for Reasoning about Natural Language Grounded in Photographs , 2018, ACL.
[30] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[31] Brian L. Price,et al. DVQA: Understanding Data Visualizations via Question Answering , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[32] Yoshua Bengio,et al. FigureQA: An Annotated Figure Dataset for Visual Reasoning , 2017, ICLR.
[33] Yoav Artzi,et al. A Corpus of Natural Language for Visual Reasoning , 2017, ACL.
[34] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[35] Xiangyu Wang,et al. What is a visual language? , 2017, J. Vis. Lang. Comput..
[36] Li Fei-Fei,et al. CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[37] Dan Klein,et al. Neural Module Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[38] Neil Cohn,et al. The Visual Language of Comics: Introduction to the Structure and Cognition of Sequential Images. , 2013 .