Unsupervised Discovery of a Statistical Verb Lexicon

This paper demonstrates how unsupervised techniques can be used to learn models of deep linguistic structure. Determining the semantic roles of a verb's dependents is an important step in natural language understanding. We present a method for learning models of verb argument patterns directly from unannotated text. The learned models are similar to existing verb lexicons such as VerbNet and PropBank, but additionally include statistics about the linkings used by each verb. The method is based on a structured probabilistic model of the domain, and unsupervised learning is performed with the EM algorithm. The learned models can also be used discriminatively as semantic role labelers, and when evaluated relative to the PropBank annotation, the best learned model reduces 28% of the error between an informed baseline and an oracle upper bound.

[1]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[2]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[3]  Suzanne Stevenson,et al.  Automatic Verb Classification Based on Statistical Distributions of Argument Structure , 2001, CL.

[4]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[5]  Sabine Schulte im Walde Clustering Verbs Semantically According to their Alternation Behaviour , 2000, COLING.

[6]  Ted Briscoe,et al.  Automatic Extraction of Subcategorization from Corpora , 1997, ANLP.

[7]  Daniel Gildea,et al.  Automatic Labeling of Semantic Roles , 2000, ACL.

[8]  Suzanne Stevenson,et al.  Automatic Verb Classification Using Distributions of Grammatical Features , 1999, EACL.

[9]  김두식,et al.  English Verb Classes and Alternations , 2006 .

[10]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[11]  Mitchell P. Marcus,et al.  Smoothing a probablistic lexicon via syntactic transformations , 2001 .

[12]  Barbara B. Levin,et al.  English verb classes and alternations , 1993 .

[13]  Daniel Jurafsky,et al.  Semantic Role Labeling Using Different Syntactic Views , 2005, ACL.

[14]  Christopher D. Manning Automatic Acquisition of a Large Sub Categorization Dictionary From Corpora , 1993, ACL.

[15]  Suzanne Stevenson,et al.  Unsupervised Semantic Role Labellin , 2004, EMNLP.

[16]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[17]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[18]  Anna Korhonen,et al.  Detecting Verbal Participation in Diathesis Alternations , 1998, ACL.

[19]  Maria Lapata,et al.  Acquiring Lexical Generalizations from Corpora: A Case Study for Diathesis Alternations , 1999, ACL.

[20]  Christopher D. Manning,et al.  Effective statistical models for syntactic and semantic disambiguation , 2005 .

[21]  Dan Roth,et al.  Generalized Inference with Multiple Semantic Role Labeling Systems , 2005, CoNLL.