A Novel Approach of Augmenting Training Data for Legal Text Segmentation by Leveraging Domain Knowledge

In this era of information overload, text segmentation can be used effectively to locate and extract information specific to users’ need within the huge collection of documents. Text segmentation refers to the task of dividing a document into smaller labeled text fragments according to the semantic commonality of the contents. Due to the presence of rich semantic information in legal text, text segmentation becomes very crucial in legal domain for information retrieval. But such supervised classification requires huge training data for building efficient classifier. Collecting and manually annotating gold standards in NLP is very expensive. In recent past the question of whether we can satisfactorily replace them with automatically annotated data is arising more and more interest. This work presents two approaches entirely based in domain knowledge for automatic generation of training data which can further be used for segmentation of court judgments.

[1]  Kevin D. Ashley,et al.  Utilizing Vector Space Models for Identifying Legal Factors from Text , 2017, JURIX.

[2]  Trevor J. M. Bench-Capon,et al.  A history of AI and Law in 50 papers: 25 years of the international conference on AI and Law , 2012, Artificial Intelligence and Law.

[3]  Vern R. Walker,et al.  Semantic types for computational legal reasoning: propositional connectives and sentence roles in the veterans' claims dataset , 2017, ICAIL.

[4]  Simone Teufel,et al.  Designing an annotation scheme for summarizing Japanese judgment documents , 2017, 2017 9th International Conference on Knowledge and Systems Engineering (KSE).

[5]  François Lévy,et al.  On Annotation of the Textual Contents of Scottish Legal Instruments , 2017, JURIX.

[6]  Sushanta Kumar,et al.  Similarity analysis of legal judgments , 2011, Bangalore Compute Conf..

[7]  Simone Teufel,et al.  Annotation of argument structure in Japanese legal documents , 2017, ArgMining@EMNLP.

[8]  M. Saravanan,et al.  Improving Legal Document Summarization Using Graphical Models , 2006, JURIX.

[9]  Isar Nejadgholi,et al.  A Semi-Supervised Training Method for Semantic Search of Legal Facts in Canadian Immigration Cases , 2017, JURIX.

[10]  Luke Miratrix,et al.  Concise comparative summaries (CCS) of large text corpora with a human experiment , 2014, ArXiv.

[11]  Advaith Siddharthan,et al.  Recognizing cited facts and principles in legal judgements , 2017, Artificial Intelligence and Law.

[12]  Sukomal Pal,et al.  Text summarization from legal documents: a survey , 2019, Artificial Intelligence Review.