论文信息 - AURORA: An Information Extraction System of Domain-specific Business Documents with Limited Data

AURORA: An Information Extraction System of Domain-specific Business Documents with Limited Data

Information extraction is a well-known topic that plays a critical role in many NLP applications as its outputs can be considered as an entrance step for digital transformation. However, there still exist gaps when applying research results to actual business cases. This paper introduces AURORA, an information extraction for domain-specific business documents. The intuition of AURORA is to use transfer learning for extraction. To do that, it utilizes the power of transformers for dealing with the limitation of training data in business cases and stacks additional layers for domain adaptation. We demonstrate AURORA in the context of actual scenarios where users are invited to experience two functions: fine-grained and whole paragraph extraction of Japanese business documents. A video of the system is available at http://y2u.be/xHQpYE41Tqw.

[1] Le Tien Dung,et al. Transfer Learning for Information Extraction with Limited Data , 2019, PACLING.

[2] Guillaume Lample,et al. Neural Architectures for Named Entity Recognition , 2016, NAACL.

[3] Jimmy J. Lin,et al. Rapid Adaptation of BERT for Information Extraction on Domain-Specific Business Documents , 2020, ArXiv.

[4] Christopher D. Manning,et al. Nested Named Entity Recognition , 2009, EMNLP.

[5] Christopher D. Manning,et al. Leveraging Linguistic Structure For Open Domain Information Extraction , 2015, ACL.

[6] Jacques Bughin,et al. A future that works: automation, employment, and productivity , 2017 .

[7] Luciano Del Corro,et al. ClausIE: clause-based open information extraction , 2013, WWW.

[8] Sophia Ananiadou,et al. A Neural Layered Model for Nested Named Entity Recognition , 2018, NAACL.

[9] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[10] Kentaro Inui,et al. An Attentive Neural Architecture for Fine-grained Entity Type Classification , 2016, AKBC@NAACL-HLT.

[11] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[12] Eduard H. Hovy,et al. Fine Grained Classification of Named Entities , 2002, COLING.