A New Minimally-Supervised Framework for Domain Word Sense Disambiguation

We present a new minimally-supervised framework for performing domain-driven Word Sense Disambiguation (WSD). Glossaries for several domains are iteratively acquired from the Web by means of a bootstrapping technique. The acquired glosses are then used as the sense inventory for fully-unsupervised domain WSD. Our experiments, on new and gold-standard datasets, show that our wide-coverage framework enables high-performance results on dozens of domains at a coarse and fine-grained level.

[1]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[2]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[3]  Sergey Brin,et al.  Extracting Patterns and Relations from the World Wide Web , 1998, WebDB.

[4]  Tetsuya Ishikawa,et al.  Utilizing the World Wide Web as an Encyclopedia: Extracting Term Descriptions from Semi-Structured Texts , 2000, ACL.

[5]  Luis Gravano,et al.  Snowball: extracting relations from large plain-text collections , 2000, DL '00.

[6]  Bernardo Magnini,et al.  Integrating Subject Field Codes into WordNet , 2000, LREC.

[7]  Carlo Strapparava,et al.  The role of domain information in Word Sense Disambiguation , 2002, Natural Language Engineering.

[8]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[9]  Eduard H. Hovy,et al.  Extending Metadata Definitions by Automatically Extracting and Organizing Glossary Definitions , 2003, DG.O.

[10]  Eneko Agirre,et al.  Publicly Available Topic Signatures for all WordNet Nominal Senses , 2004, LREC.

[11]  Julie Weeds,et al.  Finding Predominant Word Senses in Untagged Text , 2004, ACL.

[12]  Carlo Strapparava,et al.  Pattern abstraction and term similarity for Word Sense Disambiguation: IRST at Senseval-3 , 2004 .

[13]  Carlo Strapparava,et al.  Unsupervised and supervised exploitation of semantic domains in lexical disambiguation , 2004, Comput. Speech Lang..

[14]  Roberto Navigli,et al.  Semi-Automatic Extension of Large-Scale Linguistic Knowledge Bases , 2005, FLAIRS.

[15]  Carlo Strapparava,et al.  Domain Kernels for Word Sense Disambiguation , 2005, ACL.

[16]  Diana McCarthy,et al.  Domain-Speci(cid:12)c Sense Distributions and Predominant Sense Acquisition , 2022 .

[17]  Gosse Bouma,et al.  Learning to Identify Definitions using Syntactic Features , 2006, Learning Structured Information@EACL.

[18]  Jeffrey P. Bigham,et al.  Organizing and Searching the World Wide Web of Facts - Step One: The One-Million Fact Extraction Challenge , 2006, AAAI.

[19]  Hwee Tou Ng,et al.  Estimating Class Priors in Domain Adaptation for Word Sense Disambiguation , 2006, ACL.

[20]  Graeme Hirst,et al.  Determining Word Sense Dominance Using a Thesaurus , 2006, EACL.

[21]  Michelle L. Gregory,et al.  Word Domain Disambiguation via Word Sense Disambiguation , 2006, HLT-NAACL.

[22]  Hwee Tou Ng,et al.  Domain Adaptation with Active Learning for Word Sense Disambiguation , 2007, ACL.

[23]  Julie Weeds,et al.  Unsupervised Acquisition of Predominant Word Senses , 2007, CL.

[24]  Rada Mihalcea,et al.  Using Wikipedia for Automatic Word Sense Disambiguation , 2007, NAACL.

[25]  Frank Keller,et al.  An Information Retrieval Approach to Sense Ranking , 2007, HLT-NAACL.

[26]  Montse Cuadros,et al.  KnowNet: Building a Large Net of Knowledge from the Web , 2008, COLING.

[27]  Oier Lopez de Lacalle,et al.  Knowledge-Based WSD and Specific Domains: Performing Better than Generic Supervised WSD , 2009, IJCAI.

[28]  Eneko Agirre,et al.  Supervised Domain Adaption for WSD , 2009, EACL.

[29]  Eneko Agirre,et al.  Personalizing PageRank for Word Sense Disambiguation , 2009, EACL.

[30]  Ian H. Witten,et al.  Mining Meaning from Wikipedia , 2008, Int. J. Hum. Comput. Stud..

[31]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[32]  Piek T. J. M. Vossen,et al.  SemEval-2010 Task 17: All-Words Word Sense Disambiguation on a Specific Domain , 2009, *SEMEVAL.

[33]  Pushpak Bhattacharyya,et al.  All Words Domain Adapted WSD: Finding a Middle Ground between Supervision and Unsupervision , 2010, ACL.

[34]  Elena Lloret,et al.  Quantifying the Limits and Success of Extractive Summarization Systems Across Domains , 2010, HLT-NAACL.

[35]  Alexander Yates,et al.  Extracting Glosses to Disambiguate Word Senses , 2010, HLT-NAACL.

[36]  Ellen Riloff,et al.  Inducing Domain-Specific Semantic Class Taggers from (Almost) Nothing , 2010, ACL.

[37]  Paola Velardi,et al.  Learning Word-Class Lattices for Definition and Hypernym Extraction , 2010, ACL.

[38]  Pushpak Bhattacharyya,et al.  CFILT: Resource Conscious Approaches for All-Words Domain Specific WSD , 2010, SemEval@ACL.

[39]  Roland Kuhn,et al.  Discriminative Instance Weighting for Domain Adaptation in Statistical Machine Translation , 2010, EMNLP.

[40]  Eneko Agirre,et al.  Two birds with one stone: learning semantic models for text categorization and word sense disambiguation , 2011, CIKM '11.