论文信息 - ROPE: Reading Order Equivariant Positional Encoding for Graph-based Document Information Extraction - 字舞流文

ROPE: Reading Order Equivariant Positional Encoding for Graph-based Document Information Extraction

Natural reading orders of words are crucial for information extraction from form-like documents. Despite recent advances in Graph Convolutional Networks (GCNs) on modeling spatial layout patterns of documents, they have limited ability to capture reading orders of given word-level node representations in a graph. We propose Reading Order Equivariant Positional Encoding (ROPE), a new positional encoding technique designed to apprehend the sequential presentation of words in documents. ROPE generates unique reading order codes for neighboring words relative to the target word given a word-level graph connectivity. We study two fundamental document entity extraction tasks including word labeling and word grouping on the public FUNSD dataset and a large-scale payment dataset. We show that ROPE consistently improves existing GCNs with a margin up to 8.4% F1-score.

Chen-Yu Lee | Tomas Pfister | Siyang Qin | Chu Wang | Chun-Liang Li | Renshen Wang | Yasuhisa Fujii | Ashok Popat | Tomas Pfister | Ashok Popat | Chen-Yu Lee | Chu Wang | Chun-Liang Li | Siyang Qin | Yasuhisa Fujii | Renshen Wang

[1] Fabian B. Fuchs,et al. SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks , 2020, NeurIPS.

[2] Ashish Vaswani,et al. Self-Attention with Relative Position Representations , 2018, NAACL.

[3] Ping Gong,et al. PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks , 2020, ArXiv.

[4] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[5] Xiaohui Zhao,et al. CUTIE: Learning to Understand Documents with Convolutional Universal Text Information Extractor , 2019, ArXiv.

[6] Rui Zhang,et al. Graph-based Neural Multi-Document Summarization , 2017, CoNLL.

[7] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[8] D. Kirkpatrick,et al. A Framework for Computational Morphology , 1985 .

[9] Ankit Singh Rawat,et al. Are Transformers universal approximators of sequence-to-sequence functions? , 2020, ICLR.

[10] Sandeep Tata,et al. Representation Learning for Information Extraction from Form-like Documents , 2020, ACL.

[11] Jure Leskovec,et al. Position-aware Graph Neural Networks , 2019, ICML.

[12] Alán Aspuru-Guzik,et al. Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[13] Regina Barzilay,et al. GraphIE: A Graph-Based Framework for Information Extraction , 2018, NAACL.

[14] Jean-Philippe Thiran,et al. FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents , 2019, 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW).

[15] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[16] Ole Winther,et al. CloudScan - A Configuration-Free Invoice Analysis System Using Recurrent Neural Networks , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[17] Sandeep Tata,et al. Glean: Structured Extractions from Templatic Documents , 2021, Proc. VLDB Endow..

[18] H. Emptoz,et al. A fast and efficient method for extracting text paragraphs and graphics from unconstrained documents , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems.

[19] Yuan Luo,et al. Graph Convolutional Networks for Text Classification , 2018, AAAI.

[20] Cho-Jui Hsieh,et al. Learning to Encode Position for Transformer with Continuous Dynamical Model , 2020, ICML.

[21] Quoc V. Le,et al. Searching for MobileNetV3 , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22] Yann Dauphin,et al. Convolutional Sequence to Sequence Learning , 2017, ICML.

[23] Balaji Krishnamurthy,et al. Form2Seq : A Framework for Higher-Order Form Structure Extraction , 2020, EMNLP.

[24] Ashok C. Popat,et al. Post-OCR Paragraph Recognition by Graph Convolutional Networks , 2021 .

[25] Pietro Liò,et al. Graph Attention Networks , 2017, ICLR.

[26] Steffen Bickel,et al. Chargrid: Towards Understanding 2D Documents , 2018, EMNLP.

[27] Xiaojing Liu,et al. Graph Convolution for Multimodal Information Extraction from Visually Rich Documents , 2019, NAACL.

[28] Dustin Tran,et al. Image Transformer , 2018, ICML.