Extracting Zero-shot Structured Information from Form-like Documents: Pretraining with Keys and Triggers

In this paper, we revisit the problem of extracting the values of a given set of key fields from form-like documents. It is the vital step to support many downstream applications, such as knowledge base construction, question answering, document comprehension and so on. Previous studies ignore the semantics of the given keys by considering them only as the class labels, and thus might be incapable to handle zero-shot keys. Meanwhile, although these models often leverage the attention mechanism, the learned features might not reflect the true proxy of explanations on why humans would recognize the value for the key, and thus could not well generalize to new documents. To address these issues, we propose a KeyAware and Trigger-Aware (KATA) extraction model. With the input key, it explicitly learns two mappings, namely from key representations to trigger representations and then from trigger representations to values. These two mappings might be intrinsic and invariant across different keys and documents. With a large training set automatically constructed based on the Wikipedia data, we pre-train these two mappings. Experiments with the fine-tuning step to two applications show that the proposed model achieves more than 70% accuracy for the extraction of zero-shot keys while previous methods all fail.

[1]  Omer Levy,et al.  Zero-Shot Relation Extraction via Reading Comprehension , 2017, CoNLL.

[2]  Jason Weston,et al.  Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.

[3]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[4]  Nanyun Peng,et al.  Cross-Sentence N-ary Relation Extraction with Graph LSTMs , 2017, TACL.

[5]  Xin Luna Dong,et al.  ZeroShotCeres: Zero-Shot Relation Extraction from Semi-Structured Webpages , 2020, ACL.

[6]  Geoffrey E. Hinton,et al.  Zero-shot Learning with Semantic Output Codes , 2009, NIPS.

[7]  Jonathan Berant,et al.  Learning to Search in Long Documents Using Document Structure , 2018, COLING.

[8]  Steffen Bickel,et al.  Chargrid: Towards Understanding 2D Documents , 2018, EMNLP.

[9]  Pasquale De Meo,et al.  Web Data Extraction , Applications and Techniques : A Survey , 2010 .

[10]  Bill Yuchen Lin,et al.  FreeDOM: A Transferable Neural Architecture for Structured Information Extraction on Web Documents , 2020, KDD.

[11]  Eric Medvet,et al.  A probabilistic approach to printed document understanding , 2011, International Journal on Document Analysis and Recognition (IJDAR).

[12]  Chunyan Miao,et al.  A Survey of Zero-Shot Learning , 2019, ACM Trans. Intell. Syst. Technol..

[13]  W. Bruce Croft,et al.  Passage retrieval based on language models , 2002, CIKM '02.

[14]  Xiaojing Liu,et al.  Graph Convolution for Multimodal Information Extraction from Visually Rich Documents , 2019, NAACL.

[15]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[16]  Xiaohui Zhao,et al.  CUTIE: Learning to Understand Documents with Convolutional Universal Text Information Extractor , 2019, ArXiv.

[17]  Jean-Luc Meunier,et al.  Optimized XY-cut for determining a page reading order , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[18]  Alessandro Moschitti,et al.  End-to-End Relation Extraction Using Distant Supervision from External Semantic Repositories , 2011, ACL.

[19]  Percy Liang,et al.  Zero-shot Entity Extraction from Web Pages , 2014, ACL.

[20]  Ziqi Zhang,et al.  Web Scale Information Extraction with LODIE , 2013, AAAI Fall Symposia.

[21]  Bernt Schiele,et al.  Feature Generating Networks for Zero-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Yun Fu,et al.  Rethinking Zero-Shot Learning: A Conditional Visual Classification Perspective , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[23]  Sandeep Tata,et al.  Representation Learning for Information Extraction from Form-like Documents , 2020, ACL.

[24]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[25]  Furu Wei,et al.  LayoutLM: Pre-training of Text and Layout for Document Image Understanding , 2019, KDD.

[26]  Ping Gong,et al.  PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks , 2020, ArXiv.

[27]  Rajeev Rastogi,et al.  Web-scale information extraction with vertex , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[28]  Christopher Ré,et al.  Fonduer: Knowledge Base Construction from Richly Formatted Data , 2017, SIGMOD Conference.

[29]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.