A Step-wise Usage-based Method for Inducing Polysemy-aware Verb Classes

We present an unsupervised method for inducing verb classes from verb uses in gigaword corpora. Our method consists of two clustering steps: verb-specific semantic frames are first induced by clustering verb uses in a corpus and then verb classes are induced by clustering these frames. By taking this step-wise approach, we can not only generate verb classes based on a massive amount of verb uses in a scalable manner, but also deal with verb polysemy, which is bypassed by most of the previous studies on verb clustering. In our experiments, we acquire semantic frames and verb classes from two giga-word corpora, the larger comprising 20 billion words. The effectiveness of our approach is verified through quantitative evaluations based on polysemy-aware gold-standard data.

[1]  Anna Korhonen,et al.  Learning Syntactic Verb Frames using Graphical Models , 2012, ACL.

[2]  Barbara Di Eugenio,et al.  An effective Discourse Parser that uses Rich Linguistic Information , 2009, NAACL.

[3]  Suzanne Stevenson,et al.  Generalizing between form and meaning using learned verb classes , 2011, CogSci.

[4]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[5]  Bonnie J. Dorr,et al.  Large-Scale Dictionary Construction for Foreign Language Tutoring and Interlingual Machine Translation , 1998, Machine Translation.

[6]  Suzanne Stevenson,et al.  Learning verb alternations in a usage-based Bayesian model , 2010 .

[7]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[8]  Lei Shi,et al.  Putting Pieces Together: Combining FrameNet, VerbNet and WordNet for Robust Semantic Parsing , 2005, CICLing.

[9]  Anna Korhonen,et al.  Diathesis alternation approximation for verb clustering , 2013, ACL.

[10]  Chris Brew,et al.  Which Are the Best Features for Automatic Verb Classification , 2008, ACL.

[11]  Martha Palmer,et al.  Verbnet: a broad-coverage, comprehensive verb lexicon , 2005 .

[12]  Sabine Schulte im Walde,et al.  Detecting Polysemy in Hard and Soft Cluster Analyses of German Preposition Vector Spaces , 2013, IJCNLP.

[13]  Ari Rappoport,et al.  Type Level Clustering Evaluation: New Measures and a POS Induction Case Study , 2010, CoNLL.

[14]  Ivan Titov,et al.  Unsupervised Induction of Frame-Semantic Representations , 2012, HLT-NAACL 2012.

[15]  Jirí Materna,et al.  Parameter Estimation for LDA-Frames , 2013, NAACL.

[16]  Anna Korhonen,et al.  Metaphor Identification Using Verb and Noun Clustering , 2010, COLING.

[17]  David Yarowsky,et al.  One Sense per Collocation , 1993, HLT.

[18]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[19]  Thierry Poibeau,et al.  A Weakly-supervised Approach to Argumentative Zoning of Scientific Documents , 2011, EMNLP.

[20]  Sabine Schulte im Walde,et al.  Combining EM Training and the MDL Principle for an Automatic Verb Classification Incorporating Selectional Preferences , 2008, ACL.

[21]  Martha Palmer,et al.  Combining Lexical Resources: Mapping Between PropBank and VerbNet , 2006 .

[22]  Sabine Schulte im Walde Experiments on the Automatic Induction of German Semantic Verb Classes , 2006, CL.

[23]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[24]  Gemma Boleda,et al.  Modelling Polysemy in Adjective Classes by Multi-Label Classification , 2007, EMNLP.

[25]  Zoubin Ghahramani,et al.  Unsupervised and Constrained Dirichlet Process Mixture Models for Verb Clustering , 2009 .

[26]  Suzanne Stevenson,et al.  Semi-supervised Verb Class Discovery Using Noisy Features , 2003, CoNLL.

[27]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[28]  D. Aldous Exchangeability and related topics , 1985 .

[29]  Anna Korhonen,et al.  Improved Lexical Acquisition through DPP-based Verb Clustering , 2013, ACL.

[30]  Daisuke Kawahara,et al.  The Effect of Corpus Size on Case Frame Acquisition for Discourse Analysis , 2009, NAACL.

[31]  Anna Korhonen,et al.  Improving Verb Clustering with Automatically Acquired Selectional Preferences , 2009, EMNLP.

[32]  Jun'ichi Tsujii,et al.  Supervised Learning of a Probabilistic Lexicon of Verb Semantic Classes , 2009, EMNLP.

[33]  Yuval Krymolowski,et al.  Automatic Classification of English Verbs Using Rich Syntactic Features , 2008, IJCNLP.

[34]  Suzanne Stevenson,et al.  A General Feature Space for Automatic Verb Classification , 2003, EACL.

[35]  Yuval Krymolowski,et al.  Clustering Polysemic Subcategorization Frame Distributions Semantically , 2003, ACL.

[36]  Ted Briscoe,et al.  A Large Subcategorization Lexicon for Natural Language Processing Applications , 2006, LREC.

[37]  Beth Levin,et al.  English Verb Classes and Alternations: A Preliminary Investigation , 1993 .

[38]  Daisuke Kawahara,et al.  A Fully-Lexicalized Probabilistic Model for Japanese Syntactic and Case Structure Analysis , 2006, HLT-NAACL.

[39]  Ivan Titov,et al.  A Bayesian Approach to Unsupervised Semantic Role Induction , 2012, EACL.

[40]  Jirí Materna,et al.  LDA-Frames: An Unsupervised Approach to Generating Semantic Frames , 2012, CICLing.

[41]  Martha Palmer,et al.  Investigations into the role of lexical semantics in word sense disambiguation , 2004 .

[42]  Martha Palmer,et al.  Inducing Example-based Semantic Frames from a Massive Amount of Verb Uses , 2014, EACL.

[43]  Jianguo Li Disambiguating Levin Verbs Using Untagged Data , 2007 .

[44]  Suzanne Stevenson,et al.  Exploiting a Verb Lexicon in Automatic Semantic Role Labelling , 2005, HLT.

[45]  Mirella Lapata,et al.  Verb Class Disambiguation Using Informative Priors , 2004, CL.

[46]  Jean-Charles Lamirel,et al.  Classifying French Verbs Using French and English Lexical Resources , 2012, ACL.