A Large Subcategorization Lexicon for Natural Language Processing Applications

We introduce a large computational subcategorization lexicon which includes subcategorization frame (SCF) and frequency information for 6,397 English verbs. This extensive lexicon was acquired automatically from five corpora and the Web using the current version of the comprehensive subcategorization acquisition system of Briscoe and Carroll (1997). The lexicon is provided freely for research use, along with a script which can be used to filter and build sub-lexicons suited for different natural language processing (NLP) purposes. Documentation is also provided which explains each sub-lexicon option and evaluates its accuracy.

[1]  Ralph Grishman,et al.  Comlex Syntax: Building a Computational Lexicon , 1994, COLING.

[2]  Mats Rooth,et al.  Valence Induction with a Head-Lexicalized PCFG , 1998, EMNLP.

[3]  Ted Briscoe,et al.  Extended Lexical-Semantic Classification of English Verbs , 2004, HLT-NAACL 2004.

[4]  李幼升,et al.  Ph , 1989 .

[5]  Ding Yuan,et al.  Natural language generation in the context of machine translation , 2002 .

[6]  Branimir Boguraev,et al.  Large Lexicons for Natural Language Processing: Utilising the Grammar Coding System of LDOCE , 1987, CL.

[7]  Geoffrey Leech,et al.  100 Million Words of English:The British National Corpus (BNC) , 1992 .

[8]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[9]  Ted Briscoe,et al.  Automatic Extraction of Subcategorization from Corpora , 1997, ANLP.

[10]  Diana McCarthy,et al.  Disambiguating Nouns, Verbs, and Adjectives Using Automatically Acquired Selectional Preferences , 2003, CL.

[11]  김두식,et al.  English Verb Classes and Alternations , 2006 .

[12]  Michael R. Brent,et al.  From Grammar to Lexicon: Unsupervised Learning of Lexical Syntax , 1993, Comput. Linguistics.

[13]  Anna Korhonen,et al.  Improving Subcategorization Acquisition Using Word Sense Disambiguation , 2003, ACL.

[14]  Ted Briscoe,et al.  Parser evaluation: a survey and a new proposal , 1998, LREC.

[15]  Chris Brew,et al.  Inducing German Semantic Verb Classes from Purely Syntactic Subcategorisation Information , 2002, ACL.

[16]  Frank Keller,et al.  Lexicalization in Crosslinguistic Probabilistic Parsing: The Case of French , 2005, ACL.

[17]  Mark Stevenson,et al.  The Reuters Corpus Volume 1 -from Yesterday’s News to Tomorrow’s Language Resources , 2002, LREC.

[18]  Yuval Krymolowski,et al.  On the Robustness of Entropy-Based Similarity Measures in Evaluation of Subcategorization Acquisition Systems , 2002, CoNLL.

[19]  Sanda M. Harabagiu,et al.  Using Predicate-Argument Structures for Information Extraction , 2003, ACL.

[20]  Barbara B. Levin,et al.  English verb classes and alternations , 1993 .

[21]  Frank Keller,et al.  Verb Frame Frequency as a Predictor of Verb Bias , 2001, Journal of psycholinguistic research.

[22]  Yuval Krymolowski,et al.  Clustering Polysemic Subcategorization Frame Distributions Semantically , 2003, ACL.

[23]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[24]  Eva Esteve Ferrer Towards a Semantic Classification of Spanish Verbs Based on Subcategorisation Information , 2004, ACL.

[25]  Anna Korhonen,et al.  Semantically Motivated Subcategorization Acquisition , 2002, ACL 2002.

[26]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[27]  Diana McCarthy,et al.  Using Semantic Preferences to Identify Verbal Participation in Role Switching Alternations , 2000, ANLP.

[28]  Daisuke Kawahara,et al.  Japanese case structure analysis by unsupervised construction of a case frame dictionary , 2000, COLING 2000.

[29]  Ted Briscoe,et al.  Robust Accurate Statistical Annotation of General Text , 2002, LREC.

[30]  BriscoeTed,et al.  Large lexicons for natural language processing , 1987 .