The valency of predicates is a key component of a lexical entry because most, if not all, recent syntactic theories`project' syntactic structure from such information in the lexicon (e.g. Pollard & Sag, 1987). Therefore, a wide-coverage robust parser utilising a grammar based on one of these theories must have access to an accurate dictionary encoding (at a minimum) valency information and probably further details of argument structure. However, as designers of natural language processing systems have observed (e.g. Jensen, 1991) valency is closely associated to lexical sense and the senses of a word change between corpora, sublanguages and/or subject domains. Jensen et al (1994) take this as evidence that the coupling between syntactic parsing and valency information should be much weaker than in current syntactic theories. From a more theoretical standpoint, Grimshaw (1990), Pustejovsky (1993) and others have argued that valency should instead be projected from lexical semantic information. In a recent experiment with a wide-coverage parsing system utilising a grammatical framework based on standard lexicalist assumptions, Briscoe & Carroll (1993) observed that over half the analysis failures on unseen corpus examples were caused by incorrect subcategorisation for predicate valency. Because of the close connection between sense and valency and between subject domain and sense, it may be that a fully accuratèstatic' valency dictionary of the language is unattainable. The work we describe below could equally support a large-scale attempt to construct such a dictionary from substantial quantities of corpus material, or the less ambitious and more frequent construction of`disposable' dictionaries or augmentation of`self-updating' dictionaries as and when new corpora need to be parsed. We have developed a system which is potentially capable of delivering puta-tive lexical entries for predicates extracted from textual corpora, focussing on the acquisition ofàrgument structure' (deened as valency, semantic selectional restrictions/preferences, diathesis alternations, bounded dependency rules, such as passive or particle movement, and control of understood arguments in pred-icative complements) { though so far we have mostly explored predictions with respect to valency. The approach we have adopted is to construct a `shallow' syntactic but global analysis of sentences for corpus material annotated with part-of-speech and punctuation mark sequences disambiguated by a tagger. We then extract relevant competing subanalyses surrounding a given predicate from all possible shallow analyses of sentences. These so-called patternsets for a given predicate are then evaluated using heuristics and a simple probabilistic approximation of the correctness of a given pattern, so …
[1]
Michael R. Brent,et al.
Automatic Acquisition of Subcategorization Frames from Tagged Text
,
1991,
HLT.
[2]
益子 真由美.
Argument Structure
,
1993,
The Lexicon.
[3]
Gregory P. Knowles,et al.
Manual of information to accompany the SEC corpus
,
1988
.
[4]
Ralph Grishman,et al.
Standardization of the Complement/Adjunct Distinction
,
1996
.
[5]
Ralph Grishman,et al.
Comlex Syntax: Building a Computational Lexicon
,
1994,
COLING.
[6]
BriscoeTed,et al.
Large lexicons for natural language processing
,
1987
.
[7]
Lauri Karttunen,et al.
Two-Level Morphology with Composition
,
1992,
COLING.
[8]
Christopher D. Manning.
Automatic Acquisition of a Large Sub Categorization Dictionary From Corpora
,
1993,
ACL.
[9]
Ralph Grishman,et al.
Acquisition of Selectional Patterns
,
1992,
COLING.
[10]
Michael R. Brent,et al.
From Grammar to Lexicon: Unsupervised Learning of Lexical Syntax
,
1993,
Comput. Linguistics.
[11]
Karen Jensen.
A Broad-Coverage Natural Language Analysis System
,
1989,
IWPT.
[12]
Ted Briscoe,et al.
The Derivation of a Grammatically Indexed Lexicon from the Longman Dictionary of Contemporary English
,
1987,
ACL.
[13]
Karen Jensen,et al.
Natural Language Processing: The PLNLP Approach
,
2013,
Natural Language Processing.
[14]
P. Resnik.
Selection and information: a class-based approach to lexical relationships
,
1993
.
[15]
Ido Dagan,et al.
Contextual word similarity and estimation from sparse data
,
1995,
Comput. Speech Lang..
[16]
Claire Grover,et al.
The derivation of a large computational lexicon for English from LDOCE
,
1989
.
[17]
Ted Briscoe,et al.
Enjoy the Paper: Lexicology
,
1990,
COLING.
[18]
Kimmo Koskenniemi,et al.
Two-Level Morphology
,
1983
.
[19]
David Elworthy,et al.
Does Baum-Welch Re-estimation Help Taggers?
,
1994,
ANLP.
[20]
Ted Briscoe,et al.
A Formalism and Environment for the Development of a Large Grammar of English
,
1987,
IJCAI.
[21]
James Pustejovsky,et al.
Type Coercion and Lexical Selection
,
1993
.
[22]
C. Chapelle.
The Computational Analysis of English—A Corpus‐Based Approach
,
1988
.
[23]
John A. Carroll.
Practical unification-based parsing of Natural Language
,
1993
.
[24]
Antonio Sanfilippo,et al.
Detecting Dependencies between Semantic Verb Subclasses and Subcategorization Frames in Text Corpora
,
1996
.