论文信息 - LexNLP: Natural Language Processing and Information Extraction For Legal and Regulatory Texts

LexNLP: Natural Language Processing and Information Extraction For Legal and Regulatory Texts

LexNLP is an open source Python package focused on natural language processing and machine learning for legal and regulatory text. The package includes functionality to (i) segment documents, (ii) identify key text such as titles and section headings, (iii) extract over eighteen types of structured information like distances and dates, (iv) extract named entities such as companies and geopolitical entities, (v) transform text into features for model training, and (vi) build unsupervised and supervised models such as word embedding or tagging models. LexNLP includes pre-trained models based on thousands of unit tests drawn from real documents available from the SEC EDGAR database as well as various judicial and regulatory proceedings. LexNLP is designed for use in both academic research and industrial applications.

[1] Romain Boulet,et al. Network approach to the French system of legal codes part II: the role of the weights in a network , 2017, Artificial Intelligence and Law.

[2] Ian H. Witten,et al. The WEKA data mining software: an update , 2009, SKDD.

[3] Quoc V. Le,et al. Distributed Representations of Sentences and Documents , 2014, ICML.

[4] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[5] J. B. Ruhl,et al. Harnessing legal complexity , 2017, Science.

[6] J. Fowler,et al. Distance Measures for Dynamic Citation Networks , 2009, 0909.1819.

[7] C. Langlotz. RadLex: a new method for indexing online educational materials. , 2006, Radiographics : a review publication of the Radiological Society of North America, Inc.

[8] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[9] Ewan Klein,et al. Natural Language Processing with Python , 2009 .

[10] Tibor Kiss,et al. Unsupervised Multilingual Sentence Boundary Detection , 2006, CL.

[11] Gilles Louppe,et al. Independent consultant , 2013 .

[12] Petr Sojka,et al. Software Framework for Topic Modelling with Large Corpora , 2010 .