Toward adding knowledge to learning algorithms for indexing legal cases

Case-based reasoning systems have shown great promise for legal argumentation, but their development and wider availability are still slowed by the cost of manually representing cases. In this paper, we present our recent progress toward automatically indexing legal opinion texts for a CBR system. Our system SMILE uses a classification-based approach to find abstract fact situations in legal texts. To reduce the complexity inherent in legal texts, we take the individual sentences from a marked-up collection of case summaries as examples. We illustrate how integrating a legal thesaurus and linguistic information with a machine learning algorithm can help to overcome the difficulties created by legal language. The paper discusses results from a preliminary experiment with a decision tree learning algorithm. Experiments indicate that learning on the basis of sentences, rather than full documents, is effective. They also confirm that adding a legal thesaurus to the learning algorithm leads to improved performance for some, but not all, indexing concepts.