Automatic Query Generation from Legal Texts for Case Law Retrieval

This paper investigates automatic query generation from legal decisions, along with contributing a test collection for the evaluation of case law retrieval. For a sentence or paragraph within a legal decision that cites another decision, queries were automatically generated from a proportion of the terms in that sentence or paragraph. Manually generated queries were also created as a ground to empirically compare automatic methods. Automatically generated queries were found to be more effective than the average Boolean queries from experts. However, the best keyword and Boolean queries from experts significantly outperformed automatic queries.

[1]  Matthew Hurst,et al.  A Language Model Approach to Keyphrase Extraction , 2003, ACL 2003.

[2]  Allan Hanbury,et al.  Query Variations and their Effect on Comparing Information Retrieval Systems , 2016, CIKM.

[3]  Deborah Hackerson,et al.  Legal Research , 2006 .

[4]  Marc van Opijnen,et al.  Citation Analysis and Beyond: in Search of Indicators Measuring Case Law Importance , 2012, JURIX.

[5]  Gerard Salton,et al.  Automatic Information Organization And Retrieval , 1968 .

[6]  Ioannis Anagnostopoulos,et al.  Multi-dimension Diversification in Legal Information Retrieval , 2016, WISE.

[7]  Ioannis Anagnostopoulos,et al.  Evaluation of Diversification Techniques for Legal Information Retrieval , 2017, Algorithms.

[8]  Paul Compton,et al.  Combining Different Summarization Techniques for Legal Text , 2012 .

[9]  Erich Schweighofer,et al.  Legal Query Expansion using Ontologies and Relevance Feedback , 2007, LOAIT.

[10]  Djoerd Hiemstra,et al.  Parsimonious language models for information retrieval , 2004, SIGIR '04.

[11]  Howard R. Turtle Natural language vs. Boolean query evaluation: a comparison of retrieval performance , 1994, SIGIR '94.

[12]  Peter Bailey,et al.  User Variability and IR System Evaluation , 2015, SIGIR.

[13]  Yao Lu,et al.  Question Answering of Bar Exams by Paraphrasing and Legal Text Analysis , 2016, JSAI-isAI Workshops.

[14]  Guido Zuccon,et al.  Generating Clinical Queries from Patient Narratives: A Comparison between Machines and Humans , 2017, SIGIR.

[15]  Wessel Kraaij,et al.  Evaluation and analysis of term scoring methods for term extraction , 2016, Information Retrieval Journal.

[16]  Howard R. Turtle Text retrieval in the legal world , 1995, Artificial Intelligence and Law.

[17]  Vitor R. Carvalho,et al.  Reducing long queries using query quality predictors , 2009, SIGIR.

[18]  Guido Zuccon,et al.  Generating clinical queries from patient narratives , 2017 .

[19]  Douglas W. Oard,et al.  TREC 2006 Legal Track Overview , 2006, TREC.

[20]  Anselmo Peñas,et al.  Overview of ResPubliQA 2009: Question Answering Evaluation over European Legislation , 2009, CLEF.

[21]  Chen Wang,et al.  Introducing LUIMA: an experiment in legal conceptual retrieval of vaccine injury decisions using a UIMA type system and tools , 2015, ICAIL.

[22]  W. Bruce Croft,et al.  Discovering key concepts in verbose queries , 2008, SIGIR '08.