Learning Domain-Specific Discourse Rules for Information Extraction

This paper describes a system that learns discourse rules for domaln-speclfic analysis of unrestricted text. The goal of discourse analysis in this context is to transform locally identified references to relevant information in the text into a coherent representation of the entire text. This involves a complex series of decidons about merging coreferential objects, filtering out irrelevant information, inferring missing information, and identifying logical relations between domain objects. The Wrap-Up discourse analyzer induces a set of classifiers from a tra]n|ng corpus to handle these discourse decisions. Wrap-Up is fully tr~nable, and not only determ|nes what classifiers are needed based on domain output specifications, but automatically selects the features needed by each classifier. Wrap-Up’s classifiers blend linguistic knowledge with real world domain knowledge.