Sequence modelling for sentence classification in a legal summarisation system

We describe a set of experiments using a wide range of machine learning techniques for the task of predicting the rhetorical status of sentences. The research is part of a text summarisation project for the legal domain for which we use a new corpus of judgments of the UK House of Lords. We present experimental results for classification according to a rhetorical scheme indicating a sentence's contribution to the overall argumentative structure of the legal judgments using four learning algorithms from the Weka package (C4.5, naïve Bayes, Winnow and SVMs). We also report results using maximum entropy models both in a standard classification framework and in a sequence labelling framework. The SVM classifier and the maximum entropy sequence tagger yield the most promising results.

[1]  Mirella Lapata,et al.  Probabilistic Text Structuring: Experiments with Sentence Ordering , 2003, ACL.

[2]  Marc Moens,et al.  Discourse-level argumentation in scientific articles: human and automatic annotation , 1999 .

[3]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[4]  K. Krippendorff Krippendorff, Klaus, Content Analysis: An Introduction to its Methodology . Beverly Hills, CA: Sage, 1980. , 1980 .

[5]  Jean Carletta,et al.  An annotation scheme for discourse-level argumentation in research articles , 1999, EACL.

[6]  Rob Malouf,et al.  A Comparison of Algorithms for Maximum Entropy Parameter Estimation , 2002, CoNLL.

[7]  James R. Curran,et al.  Investigating GIS and Smoothing for Maximum Entropy Taggers , 2003, EACL.

[8]  J. Darroch,et al.  Generalized Iterative Scaling for Log-Linear Models , 1972 .

[9]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[10]  James R. Curran,et al.  Language Independent NER using a Maximum Entropy Tagger , 2003, CoNLL.

[11]  Klaus Krippendorff,et al.  Content Analysis: An Introduction to Its Methodology , 1980 .

[12]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[13]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[14]  Marc Moens,et al.  Articles Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status , 2002, CL.

[15]  Christopher D. Manning,et al.  Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger , 2000, EMNLP.

[16]  Miles Osborne,et al.  Using maximum entropy for sentence extraction , 2002, ACL 2002.

[17]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[18]  Claire Grover,et al.  Automatic summarisation of legal documents , 2003, ICAIL.

[19]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[20]  Ferran Plà,et al.  Shallow Parsing using Specialized HMMs , 2002, J. Mach. Learn. Res..

[21]  Claire Grover,et al.  A Rhetorical Status Classifier for Legal Text Summarisation , 2004 .

[22]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.