Legal Documents Clustering and Summarization using Hierarchical Latent Dirichlet Allocation

Received Jul 30, 2012 Revised Oct 27, 2012 Accepted Jan 07, 2013 In a common law system and in a country like India, decisions made by judges are significant sources of application and understanding of law. Online access to the Indian Legal Judgments in the digital form creates an opportunities and challenges to the both legal community and information technology researchers. This necessitates organizing, analyzing and presenting it in a useful manner to the legal community for quick understanding and for taking necessary decision pertaining to a present case. In this paper we propose an approach, to cluster legal judgments based on the topics obtained from hierarchical Latent Dirichlet Allocation (hLDA), using similarity measure between topics and documents and to find the summarry of each document using the same topics. The developed topic based model, is capable of grouping the legal judgments into different clusters and to generate summary of each legal judgment in the cluster, in effective manner compare to our previous approach [1]. Keyword:

[1]  Karen Sparck Jones Automatic keyword classification for information retrieval , 1971 .

[2]  Claire Grover,et al.  Summarising Legal Texts: Sentential Tense and Argumentative Roles , 2003, HLT-NAACL 2003.

[3]  K. Raghuveer,et al.  Legal Document Summarization using Latent Dirichlet Allocation , 2012 .

[4]  Thomas L. Griffiths,et al.  The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies , 2007, JACM.

[5]  Chinatsu Aone,et al.  Fast and effective text mining using linear-time document clustering , 1999, KDD '99.

[6]  Marc Moens,et al.  Articles Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status , 2002, CL.

[7]  Jack G. Conrad,et al.  Legal document clustering with built-in topic segmentation , 2011, CIKM '11.

[8]  Frank D. Wood,et al.  Hierarchically Supervised Latent Dirichlet Allocation , 2011, NIPS.

[9]  George A. Vouros,et al.  Non-Parametric Estimation of Topic Hierarchies from Texts with Hierarchical Dirichlet Processes , 2011, J. Mach. Learn. Res..

[10]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[11]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[12]  Claire Grover,et al.  Sequence modelling for sentence classification in a legal summarisation system , 2005, SAC '05.

[13]  Guy Lapalme,et al.  Legal Text Summarization by Exploration of the Thematic Structure and Argumentative Roles , 2004 .

[14]  W. Bruce Croft,et al.  LDA-based document models for ad-hoc retrieval , 2006, SIGIR.