Information Extraction based Multiple-Category Document Classification for the Global Legal Information Network

This paper describes a prototype application of an information extraction (IE) based document classification system in the international law domain. IE is used to determine if a set of concepts for a class are present in a document. The syntactic and semantic constraints that must be satisfied to make this determination are derived automatically from a training corpus. A collection of IE systems are arranged in a classification hierarchy and novel documents are guided down the hierarchy based on the results from the previous level. Experimental results for a research prototype are given on a subset of the Global Legal Information Network domain.