论文信息 - Towards Automatic Extraction of Argument Structure from Corpora

Towards Automatic Extraction of Argument Structure from Corpora

The valency of predicates is a key component of a lexical entry because most, if not all, recent syntactic theories`project' syntactic structure from such information in the lexicon (e.g. Pollard & Sag, 1987). Therefore, a wide-coverage robust parser utilising a grammar based on one of these theories must have access to an accurate dictionary encoding (at a minimum) valency information and probably further details of argument structure. However, as designers of natural language processing systems have observed (e.g. Jensen, 1991) valency is closely associated to lexical sense and the senses of a word change between corpora, sublanguages and/or subject domains. Jensen et al (1994) take this as evidence that the coupling between syntactic parsing and valency information should be much weaker than in current syntactic theories. From a more theoretical standpoint, Grimshaw (1990), Pustejovsky (1993) and others have argued that valency should instead be projected from lexical semantic information. In a recent experiment with a wide-coverage parsing system utilising a grammatical framework based on standard lexicalist assumptions, Briscoe & Carroll (1993) observed that over half the analysis failures on unseen corpus examples were caused by incorrect subcategorisation for predicate valency. Because of the close connection between sense and valency and between subject domain and sense, it may be that a fully accuratèstatic' valency dictionary of the language is unattainable. The work we describe below could equally support a large-scale attempt to construct such a dictionary from substantial quantities of corpus material, or the less ambitious and more frequent construction of`disposable' dictionaries or augmentation of`self-updating' dictionaries as and when new corpora need to be parsed. We have developed a system which is potentially capable of delivering puta-tive lexical entries for predicates extracted from textual corpora, focussing on the acquisition ofàrgument structure' (deened as valency, semantic selectional restrictions/preferences, diathesis alternations, bounded dependency rules, such as passive or particle movement, and control of understood arguments in pred-icative complements) { though so far we have mostly explored predictions with respect to valency. The approach we have adopted is to construct a `shallow' syntactic but global analysis of sentences for corpus material annotated with part-of-speech and punctuation mark sequences disambiguated by a tagger. We then extract relevant competing subanalyses surrounding a given predicate from all possible shallow analyses of sentences. These so-called patternsets for a given predicate are then evaluated using heuristics and a simple probabilistic approximation of the correctness of a given pattern, so …

Ted Briscoe | John Carroll | Ted Briscoe | J. Carroll