ROPE: Reading Order Equivariant Positional Encoding for Graph-based Document Information Extraction

Natural reading orders of words are crucial for information extraction from form-like documents. Despite recent advances in Graph Convolutional Networks (GCNs) on modeling spatial layout patterns of documents, they have limited ability to capture reading orders of given word-level node representations in a graph. We propose Reading Order Equivariant Positional Encoding (ROPE), a new positional encoding technique designed to apprehend the sequential presentation of words in documents. ROPE generates unique reading order codes for neighboring words relative to the target word given a word-level graph connectivity. We study two fundamental document entity extraction tasks including word labeling and word grouping on the public FUNSD dataset and a large-scale payment dataset. We show that ROPE consistently improves existing GCNs with a margin up to 8.4% F1-score.

[1]  Fabian B. Fuchs,et al.  SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks , 2020, NeurIPS.

[2]  Ashish Vaswani,et al.  Self-Attention with Relative Position Representations , 2018, NAACL.

[3]  Ping Gong,et al.  PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks , 2020, ArXiv.

[4]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[5]  Xiaohui Zhao,et al.  CUTIE: Learning to Understand Documents with Convolutional Universal Text Information Extractor , 2019, ArXiv.

[6]  Rui Zhang,et al.  Graph-based Neural Multi-Document Summarization , 2017, CoNLL.

[7]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[8]  D. Kirkpatrick,et al.  A Framework for Computational Morphology , 1985 .

[9]  Ankit Singh Rawat,et al.  Are Transformers universal approximators of sequence-to-sequence functions? , 2020, ICLR.

[10]  Sandeep Tata,et al.  Representation Learning for Information Extraction from Form-like Documents , 2020, ACL.

[11]  Jure Leskovec,et al.  Position-aware Graph Neural Networks , 2019, ICML.

[12]  Alán Aspuru-Guzik,et al.  Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[13]  Regina Barzilay,et al.  GraphIE: A Graph-Based Framework for Information Extraction , 2018, NAACL.

[14]  Jean-Philippe Thiran,et al.  FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents , 2019, 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW).

[15]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[16]  Ole Winther,et al.  CloudScan - A Configuration-Free Invoice Analysis System Using Recurrent Neural Networks , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[17]  Sandeep Tata,et al.  Glean: Structured Extractions from Templatic Documents , 2021, Proc. VLDB Endow..

[18]  H. Emptoz,et al.  A fast and efficient method for extracting text paragraphs and graphics from unconstrained documents , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems.

[19]  Yuan Luo,et al.  Graph Convolutional Networks for Text Classification , 2018, AAAI.

[20]  Cho-Jui Hsieh,et al.  Learning to Encode Position for Transformer with Continuous Dynamical Model , 2020, ICML.

[21]  Quoc V. Le,et al.  Searching for MobileNetV3 , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[23]  Balaji Krishnamurthy,et al.  Form2Seq : A Framework for Higher-Order Form Structure Extraction , 2020, EMNLP.

[24]  Ashok C. Popat,et al.  Post-OCR Paragraph Recognition by Graph Convolutional Networks , 2021 .

[25]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[26]  Steffen Bickel,et al.  Chargrid: Towards Understanding 2D Documents , 2018, EMNLP.

[27]  Xiaojing Liu,et al.  Graph Convolution for Multimodal Information Extraction from Visually Rich Documents , 2019, NAACL.

[28]  Dustin Tran,et al.  Image Transformer , 2018, ICML.