Classification and clustering for case-based criminal summary judgments

We investigate the effectiveness of machine-generated criteria for classification problems related to criminal summary judgments. Our system utilizes documents of closed lawsuits as training data for generating keyword-based and case-based classification criteria, and applies these machine-generated criteria for the classification tasks. To construct databases of the classification criteria, we employ different levels of lexical knowledge in extracting information from legal documents in Chinese, and build a case instance for each closed lawsuit. Experimental results indicate that case-based classification outperforms keyword-based classification, and that machine-generated cases may offer performance accuracy that is about 7% below that of human-provided cases. Hoping to boost inference efficiency of our classifiers, we also design methods that merge the machine-generated criteria. Empirical results show that our methods can maintain the classification quality within 20% of the quality achieved by human-provided cases, even when we aggressively reduce the number of previously machine-generated cases by about seventy percents.

[1]  Radboud Winkels,et al.  Automated legislative drafting: generating paraphrases of legislation , 1995, ICAIL '95.

[2]  Daniel S. Hirschberg,et al.  Algorithms for the Longest Common Subsequence Problem , 1977, JACM.

[3]  Fiorenza Socci,et al.  A thesaurus for improving information retrieval in an integrated legal expert system , 1998, Proceedings Ninth International Workshop on Database and Expert Systems Applications (Cat. No.98EX130).

[4]  Finn V. Jensen,et al.  Bayesian Networks and Decision Graphs , 2001, Statistics for Engineering and Information Science.

[5]  Gian Piero Zarri Semantic Web and knowledge representation , 2002, Proceedings. 13th International Workshop on Database and Expert Systems Applications.

[6]  Keh-Jiann Chen,et al.  Unknown Word Detection for Chinese by a Corpus-based Learning Method , 1998, ROCLING/IJCLCLP.

[7]  Christiane Fellbaum,et al.  Using Wordnet for Text Retrieval , 1998 .

[8]  Kevin D. Ashley Modeling legal argument - reasoning with cases and hypotheticals , 1991, Artificial intelligence and legal reasoning.

[9]  Graham Brown CHINATAX: exploring isomorphism with chinese law , 1993, ICAIL '93.

[10]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[11]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[12]  Charles B. Callaway,et al.  Automating Judicial Document Drafting: A Discourse-Based Approach , 1998 .

[13]  Gwyneth Tseng,et al.  Chinese text segmentation for text retrieval: achievements and problems , 1993 .

[14]  Keh-Jiann Chen,et al.  An Efficient Natural Language Processing System Specially Designed for the Chinese Language , 1991, Comput. Linguistics.

[15]  Vincent A. W. M. M. Aleven,et al.  Teaching case-based argumentation through a model and examples , 1997 .

[16]  Carole D. Hafner,et al.  The role of context in case-based legal reasoning: teleological, temporal, and procedural , 2002, Artificial Intelligence and Law.

[17]  Erich Schweighofer The Revolution in Legal Information Retrieval or: The Empire Strikes Back , 1999, Journal of Information, Law and Technology.

[18]  Trevor J. M. Bench-Capon,et al.  Ontologies in legal information systems; the need for explicit specifications of domain conceptualisations , 1997, ICAIL '97.

[19]  Werner Winiwarter,et al.  Exploratory analysis of concept and document spaces with connectionist networks , 1999, Artificial Intelligence and Law.

[20]  Chao-Lin Liu,et al.  Ontology-based Text Summarization for Business News Articles , 2003, CATA.

[21]  Kevin D. Ashley,et al.  Toward adding knowledge to learning algorithms for indexing legal cases , 1999, ICAIL '99.

[22]  Kevin D. Ashley,et al.  Improving the representation of legal case texts with information extraction methods , 2001, ICAIL '01.

[23]  Changning Huang,et al.  Dependency-based Syntactic Analysis of Chinese and Annotation of Parsed Corpus , 2000, ACL.

[24]  Peter Ebenhoch Legal knowledge representation using the resource description framework (RDF) , 2001, 12th International Workshop on Database and Expert Systems Applications.

[25]  Steven Salzberg,et al.  A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features , 2004, Machine Learning.

[26]  Marie-Francine Moens,et al.  Abstracting of legal cases: the SALOMON experience , 1997, ICAIL '97.

[27]  Zhao-Ming Gao,et al.  A Hybrid Approach for Automatic Classification of Chinese Unknown Verbs , 2002, Int. J. Comput. Linguistics Chin. Lang. Process..

[28]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[29]  Uri J. Schild,et al.  Intelligent computer systems for criminal sentencing , 1995, ICAIL '95.

[30]  Robert W. van Kralingen,et al.  Bringing IT support for legislative drafting one step further: from drafting support to design assistance , 1997, ICAIL '97.

[31]  Trevor J. M. Bench-Capon,et al.  Open texture and ontologies in legal information systems , 1997, Database and Expert Systems Applications. 8th International Conference, DEXA '97. Proceedings.

[32]  Thomas Wetter,et al.  A natural language based legal expert system for consultation and tutoring—the LEX project , 1987, ICAIL '87.

[33]  Paul Thompson Automatic categorization of case law , 2001, ICAIL '01.

[34]  Cyrus Tata,et al.  Decision support for sentencing in a common law jurisdiction , 1995, ICAIL '95.

[35]  Anandeep Pannu,et al.  Using genetic algorithms to inductively reason with cases in the legal domain , 1995, ICAIL '95.

[36]  Jacky Legrand,et al.  A contribution to indexing in legal information retrieval , 1997, Database and Expert Systems Applications. 8th International Conference, DEXA '97. Proceedings.

[37]  Rosina O. Weber Intelligent jurisprudence research: a new concept , 1999, ICAIL '99.