Bootstrapping a Semantic Lexicon on Verb Similarities

We present a bootstrapping algorithm to create a semantic lexicon from a list of seed words and a corpus that was mined from the web. We exploit extraction patterns to bootstrap the lexicon and use collocation statistics to dynamically score new lexicon entries. Extraction patterns are subsequently scored by calculating the conditional probability in relation to a non-related text corpus. We find that verbs that are highly domain related achieved the highest accuracy and collocation statistics affect the accuracy positively and negatively during the bootstrapping runs.

[1]  Ellen Riloff,et al.  Exploiting Strong Syntactic Heuristics and Co-Training to Learn Semantic Lexicons , 2002, EMNLP.

[2]  Patrick Pantel,et al.  Automatically Labeling Semantic Classes , 2004, NAACL.

[3]  Ellen Riloff,et al.  Automatically Generating Extraction Patterns from Untagged Text , 1996, AAAI/IAAI, Vol. 2.

[4]  Ellen Riloff,et al.  A Corpus-Based Approach for Building Semantic Lexicons , 1997, EMNLP.

[5]  Ellen Riloff,et al.  A Bootstrapping Method for Learning Semantic Lexicons using Extraction Pattern Contexts , 2002, EMNLP.

[6]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[7]  Ellen Riloff,et al.  Ensemble-based Semantic Lexicon Induction for Semantic Tagging , 2012, *SEMEVAL.

[8]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[9]  Dominic Widdows,et al.  A Graph Model for Unsupervised Lexical Acquisition , 2002, COLING.

[10]  Neal Lewis,et al.  Semantic Lexicon Induction from Twitter with Pattern Relatedness and Flexible Term Length , 2015, AAAI.

[11]  Hinrich Schütze,et al.  Bootstrapping Semantic Lexicons for Technical Domains , 2013, IJCNLP.

[12]  Brian Roark,et al.  Noun-Phrase Co-Occurence Statistics for Semi-Automatic Semantic Lexicon Construction , 1998, COLING-ACL.

[13]  Hinrich Schütze,et al.  Multilingual Lexicon Bootstrapping - Improving a Lexicon Induction System Using a Parallel Corpus , 2013, IJCNLP.

[14]  Douglas B. Lenat,et al.  CYC: a large-scale investment in knowledge infrastructure , 1995, CACM.

[15]  Ellen Riloff,et al.  Corpus-based Semantic Lexicon Induction with Web-based Corroboration , 2009 .

[16]  J. Curran,et al.  Minimising semantic drift with Mutual Exclusion Bootstrapping , 2007 .