Transformers-based information extraction with limited data for domain-specific business documents

Abstract Information extraction plays an important role for data transformation in business cases. However, building extraction systems in actual cases face two challenges: (i) the availability of labeled data is usually limited and (ii) highly detailed classification is required. This paper introduces a model for addressing the two challenges. Different from prior studies that usually require a large number of training samples, our extraction model is trained with a small number of data for extracting a large number of information types. To do that, the model takes into account the contextual aspect of pre-trained language models trained on a huge amount of data on general domains for word representation. To adapt to our downstream task, the model employs transfer learning by stacking Convolutional Neural Networks to learn hidden representation for classification. To confirm the efficiency of our method, we apply the model to two actual cases of document processing for bidding and sale documents of two Japanese companies. Experimental results on real testing sets show that, with a small number of training data, our model achieves high accuracy accepted by our clients.

[1]  Changki Lee,et al.  Fine-Grained Named Entity Recognition Using Conditional Random Fields for Question Answering , 2006, AIRS.

[2]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[3]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[4]  Taghi M. Khoshgoftaar,et al.  A survey of transfer learning , 2016, Journal of Big Data.

[5]  Sophia Ananiadou,et al.  A Neural Layered Model for Nested Named Entity Recognition , 2018, NAACL.

[6]  Hoo-Chang Hoo-Chang Shin Shin,et al.  Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning , 2016, Ieee Transactions on Medical Imaging.

[7]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[8]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[9]  Christopher D. Manning,et al.  Nested Named Entity Recognition , 2009, EMNLP.

[10]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[11]  Lejian Liao,et al.  Can Syntax Help? Improving an LSTM-based Sentence Compression Model for New Domains , 2017, ACL.

[12]  Yang Zhao,et al.  A Language Model based Evaluator for Sentence Compression , 2018, ACL.

[13]  Kevin Gimpel,et al.  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[14]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[15]  Gerhard Weikum,et al.  FINET: Context-Aware Fine-Grained Named Entity Typing , 2015, EMNLP.

[16]  Eduard H. Hovy,et al.  Fine Grained Classification of Named Entities , 2002, COLING.

[17]  Yuji Matsumoto,et al.  A Graph-Based Approach to Named Entity Categorization in Wikipedia Using Conditional Random Fields , 2007, EMNLP.

[18]  Nguyen Hong Son,et al.  Transfer Learning for Information Extraction with Limited Data , 2019, PACLING.

[19]  Luciano Del Corro,et al.  ClausIE: clause-based open information extraction , 2013, WWW.

[20]  Chun-Wei Lin,et al.  A Bi-LSTM mention hypergraph model with encoding schema for mention extraction , 2019, Eng. Appl. Artif. Intell..

[21]  Christopher D. Manning,et al.  Leveraging Linguistic Structure For Open Domain Information Extraction , 2015, ACL.

[22]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.