论文信息 - Key Information Extraction From Documents: Evaluation And Generator

Key Information Extraction From Documents: Evaluation And Generator

Extracting information from documents usually relies on natural language processing methods working on one-dimensional sequences of text. In some cases, for example, for the extraction of key information from semi-structured documents, such as invoice-documents, spatial and formatting information of text are crucial to understand the contextual meaning. Convolutional neural networks are already common in computer vision models to process and extract relationships in multidimensional data. Therefore, natural language processing models have already been combined with computer vision models in the past, to benefit from e.g. positional information and to improve performance of these key information extraction models. Existing models were either trained on unpublished data sets or on an annotated collection of receipts, which did not focus on PDF-like documents. Hence, in this research project a template-based document generator was created to compare state-of-theart models for information extraction. An existing information extraction model “Chargrid” (Katti et al., 2019) was reconstructed and the impact of a bounding box regression decoder, as well as the impact of an NLP pre-processing step was evaluated for information extraction from documents. The results have shown that NLP based pre-processing is beneficial for model performance. However, the use of a bounding box regression decoder increases the model performance only for fields that do not follow a rectangular shape.

[1] Zheng Huang,et al. ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[2] Christian Reisswig,et al. BERTgrid: Contextualized Embedding for 2D Document Representation and Understanding , 2019, ArXiv.

[3] Martín Ugarte,et al. Foundations of JSON Schema , 2016, WWW.

[4] Steffen Bickel,et al. Chargrid: Towards Understanding 2D Documents , 2018, EMNLP.

[5] Percy Liang,et al. Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.

[6] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[7] Tianqi Chen,et al. Empirical Evaluation of Rectified Activations in Convolutional Network , 2015, ArXiv.

[8] Xiaohui Zhao,et al. CUTIE: Learning to Understand Documents with Convolutional Universal Text Information Extractor , 2019, ArXiv.

[9] Wu Chou,et al. Design and Describe REST API without Violating REST: A Petri Net Based Approach , 2011, 2011 IEEE International Conference on Web Services.

[10] R. Smith,et al. An Overview of the Tesseract OCR Engine , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[11] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.