DocFormerv2: Local Features for Document Understanding
暂无分享,去创建一个
Peng Tang | Qi Dong | Peng Tang | Srikar Appalaraju | R. Manmatha | Nishant Sankaran | Yichu Zhou | Srikar Appalaraju | Nishant Sankaran | Yichu Zhou | R. Manmatha
[1] Mu Li,et al. MixGen: A New Multi-Modal Data Augmentation , 2022, 2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW).
[2] Feiqi Cao,et al. SceneGATE: Scene-Graph based co-Attention networks for TExt visual question answering , 2022, ArXiv.
[3] Mohit Bansal,et al. Unifying Vision, Text, and Layout for Universal Document Processing , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Wei Wei,et al. A Benchmark for Structured Extractions from Complex Documents , 2022, ArXiv.
[5] N. Vasconcelos,et al. YORO - Lightweight End to End Visual Grounding , 2022, ECCV Workshops.
[6] Hua Wu,et al. ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich Document Understanding , 2022, EMNLP.
[7] Julian Martin Eisenschlos,et al. Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding , 2022, ArXiv.
[8] Furu Wei,et al. XDoc: Unified Pre-training for Cross-Format Document Understanding , 2022, EMNLP.
[9] Ashish V. Thapliyal,et al. PaLI: A Jointly-Scaled Multilingual Language-Image Model , 2022, ICLR.
[10] Radu Soricut,et al. PreSTU: Pre-Training for Scene-Text Understanding , 2022, ArXiv.
[11] Ramprasaath R. Selvaraju,et al. TAG: Boosting Text-VQA via Text-aware Visual Question-answer Generation , 2022, BMVC.
[12] Zhe Gan,et al. GIT: A Generative Image-to-text Transformer for Vision and Language , 2022, Trans. Mach. Learn. Res..
[13] Oriol Vinyals,et al. Flamingo: a Visual Language Model for Few-Shot Learning , 2022, NeurIPS.
[14] Vlad I. Morariu,et al. Unified Pretraining Framework for Document Understanding , 2022, ArXiv.
[15] Furu Wei,et al. LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking , 2022, ACM Multimedia.
[16] Vlad I. Morariu,et al. End-to-end Document Recognition and Understanding with Dessurt , 2022, ECCV Workshops.
[17] Yu Zhou,et al. Towards Escaping from Language Bias and OCR Error: Semantics-Centered Text Visual Question Answering , 2022, 2022 IEEE International Conference on Multimedia and Expo (ICME).
[18] Nan Hua,et al. FormNet: Structural Encoding beyond Sequential Modeling in Form Document Information Extraction , 2022, ACL.
[19] Liqing Zhang,et al. XYLayoutLM: Towards Layout-Aware Multimodal Networks For Visually-Rich Document Understanding , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[20] Lianwen Jin,et al. LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding , 2022, ACL.
[21] Ives Macêdo,et al. SeeTek: Very Large-Scale Open-set Logo Recognition with Text-Aware Metric Learning , 2022, 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).
[22] Srikar Appalaraju,et al. LaTr: Layout-Aware Transformer for Scene-Text VQA , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Dongyoon Han,et al. OCR-Free Document Understanding Transformer , 2021, ECCV.
[24] Li Dong,et al. Swin Transformer V2: Scaling Up Capacity and Resolution , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[25] Furu Wei,et al. MarkupLM: Pre-training of Text and Markup Language for Visually Rich Document Understanding , 2021, ACL.
[26] Jean Oh,et al. Localize, Group, and Select: Boosting Text-VQA by Scene Text Modeling , 2021, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).
[27] Errui Ding,et al. StrucTexT: Structured Text Understanding with Multi-Modal Transformers , 2021, ACM Multimedia.
[28] Bhargava Urala Kota,et al. DocFormer: End-to-End Transformer for Document Understanding , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[29] Hongfu Liu,et al. SelfDoc: Self-Supervised Document Representation Learning , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[31] Tomasz Dwojak,et al. Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer , 2021, ICDAR.
[32] Wonjae Kim,et al. ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision , 2021, ICML.
[33] Cha Zhang,et al. LayoutLMv2: Multi-modal Pre-training for Visually-rich Document Understanding , 2020, ACL.
[34] Jiebo Luo,et al. TAP: Text-Aware Pre-training for Text-VQA and Text-Caption , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[35] S. Gelly,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.
[36] C. V. Jawahar,et al. DocVQA: A Dataset for VQA on Document Images , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).
[37] Seunghyun Park,et al. Spatial Dependency Parsing for Semi-Structured Document Information Extraction , 2020, FINDINGS.
[38] M. Turski,et al. DUE: End-to-End Document Understanding Benchmark , 2021, NeurIPS Datasets and Benchmarks.
[39] Stephen Lin,et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[40] Wei Han,et al. Finding the Evidence: Localization-aware Answer Prediction for Text Visual Question Answering , 2020, COLING.
[41] Yusheng Xie,et al. Towards Good Practices in Self-supervised Representation Learning , 2020, ArXiv.
[42] A. Schwing,et al. Spatially Aware Multimodal Transformers for TextVQA , 2020, ECCV.
[43] R. Manmatha,et al. SCATTER: Selective Context Attentional Scene Text Recognizer , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[44] Jianfeng Gao,et al. UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training , 2020, ICML.
[45] Furu Wei,et al. LayoutLM: Pre-training of Text and Layout for Document Image Understanding , 2019, KDD.
[46] Trevor Darrell,et al. Iterative Answer Prediction With Pointer-Augmented Multimodal Transformers for TextVQA , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[47] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[48] Wenhu Chen,et al. TabFact: A Large-scale Dataset for Table-based Fact Verification , 2019, ICLR.
[49] BROS: A PRE-TRAINED LANGUAGE MODEL , 2020 .
[50] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[51] R'emi Louf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[52] Seunghyun Park,et al. CORD: A Consolidated Receipt Dataset for Post-OCR Parsing , 2019 .
[53] Shashank Shekhar,et al. OCR-VQA: Visual Question Answering by Reading Text in Images , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).
[54] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[55] Ernest Valveny,et al. ICDAR 2019 Competition on Scene Text Visual Question Answering , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).
[56] Ernest Valveny,et al. Scene Text Visual Question Answering , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[57] Jean-Philippe Thiran,et al. FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents , 2019, 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW).
[58] Xinlei Chen,et al. Towards VQA Models That Can Read , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[59] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[60] Taku Kudo,et al. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.
[61] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[62] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[63] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.
[64] Konstantinos G. Derpanis,et al. Evaluation of deep convolutional nets for document image classification and retrieval , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).