Leveraging VerbNet to build Corpus-Specific Verb Clusters

In this paper, we aim to close the gap from extensive, human-built semantic resources and corpus-driven unsupervised models. The particular resource explored here is VerbNet, whose organizing principle is that semantics and syntax are linked. To capture patterns of usage that can augment knowledge resources like VerbNet, we expand a Dirichlet process mixture model to predict a VerbNet class for each sense of each verb, allowing us to incorporate annotated VerbNet data to guide the clustering process. The resulting clusters align more closely to hand-curated syntactic/semantic groupings than any previous models, and can be adapted to new domains since they require only corpus counts.

[1]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[2]  Kristina Toutanova,et al.  Automatic Semantic Role Labeling , 2006, NAACL.

[3]  Yuval Krymolowski,et al.  Clustering Polysemic Subcategorization Frame Distributions Semantically , 2003, ACL.

[4]  Martha Palmer,et al.  Verbnet: a broad-coverage, comprehensive verb lexicon , 2005 .

[5]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[6]  Nathan Schneider,et al.  Leveraging Heterogeneous Data Sources for Relational Semantic Parsing , 2014 .

[7]  Lonneke van der Plas,et al.  Abstraction and Generalisation in Semantic Role Labels: PropBank, VerbNet or both? , 2009, ACL/IJCNLP.

[8]  Martha Palmer,et al.  Can Semantic Roles Generalize Across Genres? , 2007, NAACL.

[9]  Beth Levin,et al.  English Verb Classes and Alternations: A Preliminary Investigation , 1993 .

[10]  Martha Palmer,et al.  A Step-wise Usage-based Method for Inducing Polysemy-aware Verb Classes , 2014, ACL.

[11]  Alessandro Moschitti,et al.  Semantic Role Labeling via FrameNet, VerbNet and PropBank , 2006, ACL.

[12]  Suzanne Stevenson,et al.  Generalizing between form and meaning using learned verb classes , 2011, CogSci.

[13]  Andrew McCallum,et al.  Topic Models Conditioned on Arbitrary Features with Dirichlet-multinomial Regression , 2008, UAI.

[14]  Ramesh Nallapati,et al.  Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora , 2009, EMNLP.