A Generic Platform to Automate Legal Knowledge Work Process Using Machine Learning

Management of legal contracts in various business domains such as Real Estate are examples of typical business process outsourcing activity. One of such process is Lease Abstraction, where largely manual inspection and validation of large commercial lease documents made for real estate deals is done by offshore experts and relevant information from the documents is extracted into a structured form. This structured information is further used for aggregate analytics and decision making by large real estate firms. We propose a system based on machine learning techniques to semi automate this process, essentially leading to 50% human effort savings. Our approach weaves together state-of-the-art machine learning techniques like supervised classifier models, sequence modeling techniques and various semi-supervised approaches. We articulate the effectiveness of our solution using the results from the experiments. Our platform is being used in production environment by Accenture Operations and the initial results and user feedback are encouraging.

[1]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[2]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[3]  James T. Kwok,et al.  MultiLabel Classification on Tree- and DAG-Structured Hierarchies , 2011, ICML.

[4]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[5]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[6]  K. Minton Extraction Patterns for Information Extraction Tasks : A Survey , 1999 .

[7]  Ellen Riloff,et al.  An Introduction to the Sundance and AutoSlog Systems , 2011 .

[8]  Ronen Feldman,et al.  Book Reviews: The Text Mining Handbook: Advanced Approaches to Analyzing Unstructured Data by Ronen Feldman and James Sanger , 2008, CL.

[9]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[10]  Khaled Shaalan,et al.  A Survey of Web Information Extraction Systems , 2006, IEEE Transactions on Knowledge and Data Engineering.

[11]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .

[12]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[13]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[14]  Ben Taskar,et al.  An Introduction to Conditional Random Fields for Relational Learning , 2007 .

[15]  Dan Klein,et al.  Prototype-Driven Learning for Sequence Models , 2006, NAACL.

[16]  Sven Behnke,et al.  PyStruct: learning structured prediction in python , 2014, J. Mach. Learn. Res..

[17]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[18]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[19]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[20]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[21]  Maria T. Pazienza,et al.  Information Extraction , 2002, Lecture Notes in Computer Science.

[22]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[23]  F ShaalanKhaled,et al.  A Survey of Web Information Extraction Systems , 2006 .

[24]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields for Relational Learning , 2007 .

[25]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.