Learning Syntactic Verb Frames using Graphical Models

We present a novel approach for building verb subcategorization lexicons using a simple graphical model. In contrast to previous methods, we show how the model can be trained without parsed input or a predefined subcategorization frame inventory. Our method outperforms the state-of-the-art on a verb clustering task, and is easily trained on arbitrary domains. This quantitative evaluation is complemented by a qualitative discussion of verbs and their frames. We discuss the advantages of graphical models for this task, in particular the ease of integrating semantic information about verbs and arguments in a principled fashion. We conclude with future work to augment the approach.

[1]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[2]  Gregor Heinrich Parameter estimation for text analysis , 2009 .

[3]  Johan Bos,et al.  Linguistically Motivated Large-Scale NLP with C&C and Boxer , 2007, ACL.

[4]  Anna Korhonen,et al.  Exploring subdomain variation in biomedical language , 2010, BMC Bioinformatics.

[5]  Tiejun Zhao,et al.  Weakly Supervised SVM for Chinese- English Cross-lingual Subcategorization Lexicon Acquisition , 2008 .

[6]  Ted Briscoe,et al.  Can Subcategorisation Probabilities Help a Statistical Parser , 1998, VLC@COLING/ACL.

[7]  Sanda M. Harabagiu,et al.  Using Predicate-Argument Structures for Information Extraction , 2003, ACL.

[8]  Sophia Ananiadou,et al.  Bootstrapping a Verb Lexicon for Biomedical Information Extraction , 2009, CICLing.

[9]  Nigel Collier,et al.  The Choice of Features for Classification of Verbs in Biomedical Texts , 2008, COLING.

[10]  K. Bretonnel Cohen,et al.  A critical review of PASBio's argument structures for biomedical verbs , 2006, BMC Bioinformatics.

[11]  Radford M. Neal Probabilistic Inference Using Markov Chain Monte Carlo Methods , 2011 .

[12]  Christopher D. Manning,et al.  Verb Sense and Subcategorization: Using Joint Inference to Improve Performance on Complementary Task , 2004, EMNLP.

[13]  Ari Rappoport,et al.  Fully Unsupervised Core-Adjunct Argument Classification , 2010, ACL.

[14]  Ted Briscoe,et al.  The Second Release of the RASP System , 2006, ACL.

[15]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[16]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[17]  Ted Briscoe,et al.  A Large Subcategorization Lexicon for Natural Language Processing Applications , 2006, LREC.

[18]  Nigel Collier,et al.  Automatic Classification of Verbs in Biomedical Texts , 2006, ACL.

[19]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[20]  BriscoeTed,et al.  Large lexicons for natural language processing , 1987 .

[21]  Sophia Ananiadou,et al.  A Specialised Verb Lexicon as the Basis of Fact Extraction in the Biomedical Domain , 2010 .

[22]  Beth Levin,et al.  English Verb Classes and Alternations: A Preliminary Investigation , 1993 .

[23]  Gregor Heinrich “ Infinite LDA ” – Implementing the HDP with minimum code complexity , 2011 .

[24]  Adam R. Teichert Unsupervised Part of Speech Tagging Without a Lexicon , 2009 .

[25]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[26]  Neville Ryant,et al.  A Large-scale Classication of English Verbs , 2006 .

[27]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[28]  Suzanne Stevenson,et al.  Automatic Verb Classification Based on Statistical Distributions of Argument Structure , 2001, CL.

[29]  Christopher D. Manning,et al.  The Infinite Tree , 2007, ACL.

[30]  Cédric Messiant,et al.  A Subcategorization Acquisition System for French Verbs , 2008, ACL.

[31]  MerloPaola,et al.  Automatic verb classification based on statistical distributions of argument structure , 2001 .

[32]  Marina Meila,et al.  Comparing Clusterings by the Variation of Information , 2003, COLT.

[33]  Daniel Jurafsky,et al.  How Verb Subcategorization Frequencies Are Affected By Corpus Choice , 1998, COLING.

[34]  Diana McCarthy,et al.  Using Semantic Preferences to Identify Verbal Participation in Role Switching Alternations , 2000, ANLP.

[35]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[36]  Suzanne Stevenson,et al.  A General Feature Space for Automatic Verb Classification , 2003, EACL.

[37]  Neville Ryant,et al.  A large-scale classification of English verbs , 2008, Lang. Resour. Evaluation.

[38]  Ralph Grishman,et al.  Comlex Syntax: Building a Computational Lexicon , 1994, COLING.

[39]  Jun'ichi Tsujii,et al.  Probabilistic Disambiguation Models for Wide-Coverage HPSG Parsing , 2005, ACL.

[40]  Anna Korhonen,et al.  Improving Verb Clustering with Automatically Acquired Selectional Preferences , 2009, EMNLP.

[41]  Sabine Schulte im Walde 44. The induction of verb frames and verb classes from corpora , 2009 .

[42]  Anna Korhonen,et al.  Statistical Filtering and Subcategorization Frame Acquisition , 2000, EMNLP.

[43]  Diarmuid Ó Séaghdha Latent Variable Models of Selectional Preference , 2010, ACL.

[44]  Branimir Boguraev,et al.  Large Lexicons for Natural Language Processing: Utilising the Grammar Coding System of LDOCE , 1987, CL.

[45]  Sebastian Riedel,et al.  The CoNLL 2007 Shared Task on Dependency Parsing , 2007, EMNLP.

[46]  Ted Briscoe,et al.  A System for Large-Scale Acquisition of Verbal, Nominal and Adjectival Subcategorization Frames from Corpora , 2007, ACL.

[47]  Vito Pirrelli,et al.  Unsupervised Acquisition of Verb Subcategorization Frames from Shallow-Parsed Corpora , 2008, LREC.