Cross-Domain Contract Element Extraction with a Bi-directional Feedback Clause-Element Relation Network

Contract element extraction (CEE) is the novel task of automatically identifying and extracting legally relevant elements such as contract dates, payments, and legislation references from contracts. Automatic methods for this task view it as a sequence labeling problem and dramatically reduce human labor. However, as contract genres and element types may vary widely, a significant challenge for this sequence labeling task is how to transfer knowledge from one domain to another, i.e., cross-domain CEE. Cross-domain CEE differs from cross-domain named entity recognition (NER) in two important ways. First, contract elements are far more fine-grained than named entities, which hinders the transfer of extractors. Second, the extraction zones for cross-domain CEE are much larger than for cross-domain NER. As a result, the contexts of elements from different domains can be more diverse. We propose a framework, the Bi-directional Feedback cLause-Element relaTion network (Bi-FLEET), for the cross-domain CEE task that addresses the above challenges. Bi-FLEET has three main components: (1) a context encoder, (2) a clause-element relation encoder, and (3) an inference layer. To incorporate invariant knowledge about element and clause types, a clause-element graph is constructed across domains and a hierarchical graph neural network is adopted in the clause-element relation encoder. To reduce the influence of context variations, a multi-task framework with a bi-directional feedback scheme is designed in the inference layer, conducting both clause classification and element extraction. The experimental results over both cross-domain NER and CEE tasks show that Bi-FLEET significantly outperforms state-of-the-art baselines.

[1]  Minh-Tien Nguyen,et al.  Legal Question Answering using Ranking SVM and Deep Convolutional Neural Network , 2017, ArXiv.

[2]  Kalina Bontcheva,et al.  Broad Twitter Corpus: A Diverse Named Entity Recognition Resource , 2016, COLING.

[3]  Ruslan Salakhutdinov,et al.  Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks , 2016, ICLR.

[4]  Karol Kaczmarek,et al.  Contract Discovery: Dataset and a Few-shot Semantic Retrieval Challenge with Competitive Baselines , 2020, FINDINGS.

[5]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[6]  Burkhard Schafer,et al.  Concept and Context in Legal Information Retrieval , 2008, JURIX.

[7]  Sampo Pyysalo,et al.  How to Train good Word Embeddings for Biomedical NLP , 2016, BioNLP@ACL.

[8]  Kishore Varma Indukuri,et al.  Mining e-contract documents to classify clauses , 2010, Bangalore Compute Conf..

[9]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[10]  Tat-Seng Chua,et al.  Graph Adversarial Training: Dynamically Regularizing Based on Graph Structure , 2019, IEEE Transactions on Knowledge and Data Engineering.

[11]  Minh Le Nguyen,et al.  Building Legal Case Retrieval Systems with Lexical Matching and Summarization using A Pre-Trained Phrase Scoring Model , 2019, ICAIL.

[12]  Jian Su,et al.  Transfer joint embedding for cross-domain named entity recognition , 2013, TOIS.

[13]  Xiangnan He,et al.  Should Graph Convolution Trust Neighbors? A Simple Causal Inference Method , 2020, SIGIR.

[14]  Wei Lu,et al.  Neural Adaptation Layers for Cross-domain Named Entity Recognition , 2018, EMNLP.

[15]  Ngo Xuan Bach,et al.  Answering Legal Questions by Learning Neural Attentive Text Representation , 2020, COLING.

[16]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[17]  Mi-Young Kim,et al.  Overview of COLIEE 2017 , 2017, COLIEE@ICAIL.

[18]  Danushka Bollegala,et al.  CLIEL: context-based information extraction from commercial law documents , 2017, ICAIL.

[19]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[20]  Liang Xiao,et al.  Cross-Domain NER using Cross-Domain Language Modeling , 2019, ACL.

[21]  Yue Zhang,et al.  NCRF++: An Open-source Neural Sequence Labeling Toolkit , 2018, ACL.

[22]  Gordon J. Pace,et al.  Integrating natural language and formal analysis for legal documents , 2016 .

[23]  Timothy Baldwin,et al.  Named Entity Recognition for Novel Types by Transfer Learning , 2016, EMNLP.

[24]  Lin Sun,et al.  TOI-CNN: a Solution of Information Extraction on Chinese Insurance Policy , 2019, NAACL.

[25]  Zihan Wang,et al.  Robust Embedding with Multi-Level Structures for Link Prediction , 2019, IJCAI.

[26]  Ken Satoh,et al.  Encoded summarization: summarizing documents into continuous vector space for legal case retrieval , 2020, Artificial Intelligence and Law.

[27]  Ion Androutsopoulos,et al.  Neural Contract Element Extraction Revisited , 2019, ArXiv.

[28]  Chenliang Li,et al.  A Survey on Deep Learning for Named Entity Recognition , 2018, IEEE Transactions on Knowledge and Data Engineering.

[29]  Michael Curtotti,et al.  Corpus Based Classification of Text in Australian Contracts , 2010, ALTA.

[30]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[31]  J. Cole,et al.  On design and implementation of a contract monitoring facility , 2004, Proceedings. First IEEE International Workshop on Electronic Contracting, 2004..

[32]  Pengtao Xie,et al.  Effective Use of Bidirectional Language Modeling for Transfer Learning in Biomedical Named Entity Recognition , 2017, MLHC.

[33]  Noah A. Smith,et al.  Softmax-Margin CRFs: Training Log-Linear Models with Cost Functions , 2010, NAACL.

[34]  Yue Zhang,et al.  Multi-Cell Compositional LSTM for NER Domain Adaptation , 2020, ACL.

[35]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[36]  Heting Chu,et al.  Factors affecting relevance judgment: a report from TREC Legal track , 2011, J. Documentation.

[37]  Jaewoo Kang,et al.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..

[38]  Jun Zhao,et al.  Adversarial Transfer Learning for Chinese Named Entity Recognition with Self-Attention Mechanism , 2018, EMNLP.

[39]  Keet Sugathadasa,et al.  Legal Document Retrieval using Document Vector Embeddings and Deep Learning , 2018, Advances in Intelligent Systems and Computing.

[40]  Ion Androutsopoulos,et al.  A Deep Learning Approach to Contract Element Extraction , 2017, JURIX.

[41]  Ion Androutsopoulos,et al.  Extracting contract elements , 2017, ICAIL.

[42]  Franck Dernoncourt,et al.  Transfer Learning for Named-Entity Recognition with Neural Networks , 2017, LREC.