Unsupervised WSD based on Automatically Retrieved Examples: The Importance of Bias

This paper explores the large-scale acquisition of sense-tagged examples for Word Sense Disambiguation (WSD). We have applied the “WordNet monosemous relatives” method to construct automatically a web corpus that we have used to train disambiguation systems. The corpus-building process has highlighted important factors, such as the distribution of senses (bias). The corpus has been used to train WSD algorithms that include supervised methods (combining automatic and manuallytagged examples), minimally supervised (requiring sense bias information from hand-tagged corpora), and fully unsupervised. These methods were tested on the Senseval-2 lexical sample test set, and compared successfully to other systems with minimum or no supervision.

[1]  George A. Miller,et al.  A Semantic Concordance , 1993, HLT.

[2]  David Yarowsky,et al.  DECISION LISTS FOR LEXICAL AMBIGUITY RESOLUTION: Application to Accent Restoration in Spanish and French , 1994, ACL.

[3]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[4]  George A. Miller,et al.  Using Corpus Statistics and WordNet Relations for Sense Identification , 1998, CL.

[5]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[6]  Shlomo Argamon,et al.  Committee-Based Sample Selection for Probabilistic Classifiers , 1999, J. Artif. Intell. Res..

[7]  Eneko Agirre,et al.  One Sense per Collocation and Genre/Topic Variations , 2000, EMNLP.

[8]  Lluís Padró,et al.  Mapping WordNets Using Structural Information , 2000, ACL.

[9]  Eneko Agirre,et al.  Exploring Automatic Word Sense Disambiguation with Decision Lists and the Web , 2000, SAIC@COLING.

[10]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[11]  Julio Gonzalo,et al.  The UNED Systems at SENSEVAL-2 , 2001, *SEMEVAL.

[12]  Adam Kilgarriff,et al.  WASP-Bench: a Lexicographic Tool Supporting Word Sense Disambiguation , 2001, SENSEVAL@ACL.

[13]  Scott Cotton,et al.  SENSEVAL-2: Overview , 2001, *SEMEVAL.

[14]  Rada Mihalcea,et al.  Bootstrapping Large Sense Tagged Corpora , 2002, LREC.

[15]  Eneko Agirre,et al.  Publicly Available Topic Signatures for all WordNet Nominal Senses , 2004, LREC.

[16]  Julie Weeds,et al.  Finding Predominant Word Senses in Untagged Text , 2004, ACL.

[17]  German Rigau,et al.  Automatic Acquisition of Sense Examples Using ExRetriever , 2004, LREC.