论文信息 - Information Extraction based Multiple-Category Document Classification for the Global Legal Information Network

Information Extraction based Multiple-Category Document Classification for the Global Legal Information Network

This paper describes a prototype application of an information extraction (IE) based document classification system in the international law domain. IE is used to determine if a set of concepts for a class are present in a document. The syntactic and semantic constraints that must be satisfied to make this determination are derived automatically from a training corpus. A collection of IE systems are arranged in a classification hierarchy and novel documents are guided down the hierarchy based on the results from the previous level. Experimental results for a research prototype are given on a subset of the Global Legal Information Network domain.

Nabil R. Adam | Richard D. Holowczak

[1] N. Adam,et al. The Global Legal Information Network ("GLIN") , 1996 .

[2] Richard Holowczak. Extractors for digital library objects , 1997 .

[3] Ellen Riloff,et al. Information extraction as a basis for high-precision text classification , 1994, TOIS.

[4] Joe McCarthy. Tools and techniques for rapid porting , 1993, MUC.

[5] Hsinchun Chen. Machine learning for information retrieval: neural networks, symbolic learning, and genetic algorithms , 1995 .

[6] David Fisher,et al. CRYSTAL: Inducing a Conceptual Dictionary , 1995, IJCAI.

[7] Ellen Riloff,et al. Automatically Acquiring Conceptual Patterns without an Annotated Corpus , 1995, VLC@ACL.

[8] Marc Goodman,et al. Prism: A Case-Based Telex Classifier , 1990, IAAI.

[9] Philip J. Hayes,et al. CONSTRUE/TIS: A System for Content-Based Indexing of a Database of News Stories , 1990, IAAI.

[10] Gerard Salton,et al. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .