DiT: Self-supervised Pre-training for Document Image Transformer
暂无分享,去创建一个
Furu Wei | Lei Cui | Yiheng Xu | Tengchao Lv | Chaoxi Zhang | Junlong Li
[1] Furu Wei,et al. LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking , 2022, ACM Multimedia.
[2] Ross B. Girshick,et al. Benchmarking Detection Transfer Learning with Vision Transformers , 2021, ArXiv.
[3] Furu Wei,et al. Document AI: Benchmarks, Models and Applications , 2021, ArXiv.
[4] Tao Kong,et al. iBOT: Image BERT Pre-Training with Online Tokenizer , 2021, ArXiv.
[5] Ross B. Girshick,et al. Masked Autoencoders Are Scalable Vision Learners , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Bhargava Urala Kota,et al. DocFormer: End-to-End Transformer for Document Understanding , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[7] Matthijs Douze,et al. XCiT: Cross-Covariance Image Transformers , 2021, NeurIPS.
[8] Li Dong,et al. BEiT: BERT Pre-Training of Image Transformers , 2021, ICLR.
[9] Hongfu Liu,et al. SelfDoc: Self-Supervised Document Representation Learning , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[10] Fei Huang,et al. StructuralLM: Structural Pre-training for Form Understanding , 2021, ACL.
[11] Julien Mairal,et al. Emerging Properties in Self-Supervised Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[12] Furu Wei,et al. LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding , 2021, ArXiv.
[13] Michael Bendersky,et al. LAMPRET: Layout-Aware Multimodal PreTraining for Document Understanding , 2021, ArXiv.
[14] Saining Xie,et al. An Empirical Study of Training Self-Supervised Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[15] Alec Radford,et al. Zero-Shot Text-to-Image Generation , 2021, ICML.
[16] Tomasz Dwojak,et al. Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer , 2021, ICDAR.
[17] Cha Zhang,et al. LayoutLMv2: Multi-modal Pre-training for Visually-rich Document Understanding , 2020, ACL.
[18] Matthieu Cord,et al. Training data-efficient image transformers & distillation through attention , 2020, ICML.
[19] S. Gelly,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.
[20] Shashank Mujumdar,et al. Towards a Multi-modal, Multi-task Learning based Pre-training Framework for Document Representation Learning , 2020, ArXiv.
[21] Mark Chen,et al. Generative Pretraining From Pixels , 2020, ICML.
[22] Furu Wei,et al. DocBank: A Benchmark Dataset for Document Layout Analysis , 2020, COLING.
[23] Nicolas Usunier,et al. End-to-End Object Detection with Transformers , 2020, ECCV.
[24] Lukasz Garncarek,et al. LAMBERT: Layout-Aware Language Modeling for Information Extraction , 2020, ICDAR.
[25] Furu Wei,et al. LayoutLM: Pre-training of Text and Layout for Document Image Understanding , 2019, KDD.
[26] Antonio Jimeno-Yepes,et al. Image-based table recognition: data, model, and evaluation , 2019, ECCV.
[27] Kai Chen,et al. Real-time Scene Text Detection with Differentiable Binarization , 2019, AAAI.
[28] Yu Fang,et al. ICDAR 2019 Competition on Table Detection and Recognition (cTDaR) , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).
[29] Antonio Jimeno-Yepes,et al. PubLayNet: Largest Dataset Ever for Document Layout Analysis , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).
[30] Arnab Nandi,et al. Deterministic Routing between Layout Abstractions for Multi-Scale Classification of Visually Rich Documents , 2019, IJCAI.
[31] Jean-Philippe Thiran,et al. FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents , 2019, 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW).
[32] Zhoujun Li,et al. TableBank: Table Benchmark for Image-based Table Detection and Recognition , 2019, LREC.
[33] Ujjwal Bhattacharya,et al. Document Image Classification with Intra-Domain Transfer Learning and Stacked Generalization of Deep Convolutional Neural Networks , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).
[34] Nuno Vasconcelos,et al. Cascade R-CNN: Delving Into High Quality Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[35] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[36] Marcus Liwicki,et al. Cutting the Error by Half: Investigation of Very Deep CNN and Advanced Training Strategies for Document Image Classification , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).
[37] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.
[38] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[39] Kilian Q. Weinberger,et al. Deep Networks with Stochastic Depth , 2016, ECCV.
[40] Konstantinos G. Derpanis,et al. Evaluation of deep convolutional nets for document image classification and retrieval , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).
[41] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[42] Shlomo Argamon,et al. Building a test collection for complex document information processing , 2006, SIGIR.
[43] Furu Wei,et al. XFUND: A Benchmark Dataset for Multilingual Visually Rich Form Understanding , 2022, FINDINGS.
[44] Stephen Lin,et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[45] BROS: A PRE-TRAINED LANGUAGE MODEL , 2020 .
[46] Thomas Keller,et al. Towards a multi-modal IT , 2018 .