Learning Verb Subcategorization from Corpora: Counting Frame Subsets

We present some novel machine learning techniques for the identification of subcategorization information for verbs in Czech. We compare three different statistical techniques applied to this problem. We show how the learning algorithm can be used to discover previously unknown subcategorization frames from the Czech Prague Dependency Treebank. The algorithm can then be used to label dependents of a verb in the Czech treebank as either arguments or adjuncts. Using our techniques, we are able to achieve 88 % accuracy on unseen parsed text.

[1]  Ted Briscoe,et al.  Automatic Extraction of Subcategorization from Corpora , 1997, ANLP.

[2]  P. Bickel,et al.  Mathematical Statistics: Basic Ideas and Selected Topics , 1977 .

[3]  Hang Li,et al.  Learning Dependencies between Case Frame Slots , 1999, Comput. Linguistics.

[4]  Alex Waibel,et al.  The Automatic Acquisition of Frequencies of Verb Subcategorization Frames from Tagged Corpora , 2002 .

[5]  Ted Briscoe,et al.  Can Subcategorisation Probabilities Help a Statistical Parser , 1998, VLC@COLING/ACL.

[6]  Mats Rooth,et al.  Valence Induction with a Head-Lexicalized PCFG , 1998, EMNLP.

[7]  Suzanne Stevenson,et al.  Supervised Learning of Lexical Semantic Verb Classes Using Frequency Distributions , 1999, SIGLEX Workshop On Standardizing Lexical Resources.

[8]  Eugene Charniak,et al.  A statistical syntactic disambiguation program and what it learns , 1995, Learning for Natural Language Processing.

[9]  Michael R. Brent,et al.  From Grammar to Lexicon: Unsupervised Learning of Lexical Syntax , 1993, Comput. Linguistics.

[10]  Suzanne Stevenson,et al.  Automatic Verb Classification Using Distributions of Grammatical Features , 1999, EACL.

[11]  Mirella Lapata,et al.  Using Subcategorization to Resolve Verb Class Ambiguity , 1999, EMNLP.

[12]  Eric V. Siegel Learning Methods for Combining Linguistic Indicators to Classify Verbs , 1997, EMNLP.

[13]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[14]  Christopher D. Manning Automatic Acquisition of a Large Sub Categorization Dictionary From Corpora , 1993, ACL.

[15]  Mitchell P. Marcus,et al.  Automatic Acquisition of the Lexical Semantics of Verbs from Sentence Frames , 1989, ACL.

[16]  Maria Lapata,et al.  Acquiring Lexical Generalizations from Corpora: A Case Study for Diathesis Alternations , 1999, ACL.

[17]  Jan Hajic,et al.  Tagging Inflective Languages: Prediction of Morphological Categories for a Rich Structured Tagset , 1998, ACL.