Learning Verb Argument Structure from Minimally Annotated Corpora

In this paper we investigate the task of automatically identifying the correct argument structure for a set of verbs. The argument structure of a verb allows us to predict the relationship between the syntactic arguments of a verb and their role in the underlying lexical semantics of the verb. Following the method described in (Merlo and Stevenson, 2001), we exploit the distributions of some selected features from the local context of a verb. These features were extracted from a 23M word WSJ corpus based on part-of-speech tags and phrasal chunks alone. We constructed several decision tree classifiers trained on this data. The best performing classifier achieved an error rate of 33.4%. This work shows that a subcategorization frame (SF) learning algorithm previously applied to Czech (Sarkar and Zeman, 2000) is used to extract SFs in English. The extracted SFs are evaluated by classifying verbs into verb alternation classes.

[1]  金田 重郎,et al.  C4.5: Programs for Machine Learning (書評) , 1995 .

[2]  Alex Waibel,et al.  The Automatic Acquisition of Frequencies of Verb Subcategorization Frames from Tagged Corpora , 2002 .

[3]  MerloPaola,et al.  Automatic verb classification based on statistical distributions of argument structure , 2001 .

[4]  Sabine Schulte im Walde Clustering Verbs Semantically According to their Alternation Behaviour , 2000, COLING.

[5]  Barbara B. Levin,et al.  English verb classes and alternations , 1993 .

[6]  Suzanne Stevenson,et al.  Automatic Verb Classification Based on Statistical Distributions of Argument Structure , 2001, CL.

[7]  Mats Rooth,et al.  Valence Induction with a Head-Lexicalized PCFG , 1998, EMNLP.

[8]  Steven Abney,et al.  Part-of-Speech Tagging and Partial Parsing , 1997 .

[9]  Michael R. Brent,et al.  From Grammar to Lexicon: Unsupervised Learning of Lexical Syntax , 1993, Comput. Linguistics.

[10]  Anoop Sarkar,et al.  Automatic Extraction of Subcategorization Frames for Czech , 2000, COLING.

[11]  Christopher D. Manning,et al.  Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger , 2000, EMNLP.

[12]  P. Bickel,et al.  Mathematical Statistics: Basic Ideas and Selected Topics , 1977 .

[13]  R. F.,et al.  Mathematical Statistics , 1944, Nature.

[14]  Mirella Lapata,et al.  Using Subcategorization to Resolve Verb Class Ambiguity , 1999, EMNLP.

[15]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[16]  Anna Korhonen,et al.  Detecting Verbal Participation in Diathesis Alternations , 1998, ACL.

[17]  Suzanne Stevenson,et al.  Automatic Verb Classification Using Distributions of Grammatical Features , 1999, EACL.

[18]  Suzanne Stevenson Paolo Merlo Lexical structure and parsing complexity , 1997 .

[19]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[20]  Ted Briscoe,et al.  Automatic Extraction of Subcategorization from Corpora , 1997, ANLP.

[21]  Anna Korhonen,et al.  Statistical Filtering and Subcategorization Frame Acquisition , 2000, EMNLP.

[22]  Suzanne Stevenson,et al.  Supervised Learning of Lexical Semantic Verb Classes Using Frequency Distributions , 1999, SIGLEX Workshop On Standardizing Lexical Resources.

[23]  Maria Lapata,et al.  Acquiring Lexical Generalizations from Corpora: A Case Study for Diathesis Alternations , 1999, ACL.

[24]  Daniel Gildea Probabilistic Models of Verb-Argument Structure , 2002, COLING.

[25]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[26]  Christopher D. Manning Automatic Acquisition of a Large Sub Categorization Dictionary From Corpora , 1993, ACL.

[27]  Eugene Charniak,et al.  A statistical syntactic disambiguation program and what it learns , 1995, Learning for Natural Language Processing.