An Experimental Study of Graph Connectivity for Unsupervised Word Sense Disambiguation

Word sense disambiguation (WSD), the task of identifying the intended meanings (senses) of words in context, has been a long-standing research objective for natural language processing. In this paper, we are concerned with graph-based algorithms for large-scale WSD. Under this framework, finding the right sense for a given word amounts to identifying the most ¿important¿ node among the set of graph nodes representing its senses. We introduce a graph-based WSD algorithm which has few parameters and does not require sense-annotated data for training. Using this algorithm, we investigate several measures of graph connectivity with the aim of identifying those best suited for WSD. We also examine how the chosen lexicon and its connectivity influences WSD performance. We report results on standard data sets and show that our graph-based approach performs comparably to the state of the art.

[1]  Hector Garcia-Molina,et al.  Combating Web Spam with TrustRank , 2004, VLDB.

[2]  Mirella Lapata,et al.  Ensemble Methods for Unsupervised WSD , 2006, ACL.

[3]  Christian Posse,et al.  PNNL: A Supervised Maximum Entropy Approach to Word Sense Disambiguation , 2007, SemEval@ACL.

[4]  L. Freeman Centrality in social networks conceptual clarification , 1978 .

[5]  Ganesh Ramakrishnan,et al.  Passage Scoring for Question Answering via Bayesian Inference on Lexical Relations , 2003, TREC.

[6]  Dan Tufis,et al.  RACAI: Meaning Affinity Models , 2007, SemEval@ACL.

[7]  Xiaojun Wan,et al.  Improved Affinity Graph Based Multi-Document Summarization , 2006, NAACL.

[8]  Rada Mihalcea,et al.  Unsupervised Large-Vocabulary Word Sense Disambiguation with Graph-based Algorithms for Sequence Data Labeling , 2005, HLT.

[9]  Scott Cotton,et al.  SENSEVAL-2: Overview , 2001, *SEMEVAL.

[10]  Werner R. W. Scheinhardt,et al.  In-Degree and PageRank of Web pages: Why do they follow similar power laws? , 2006, ArXiv.

[11]  Paola Velardi,et al.  Structural semantic interconnections: a knowledge-based approach to word sense disambiguation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Eneko Agirre,et al.  Two graph-based algorithms for state-of-the-art WSD , 2006, EMNLP.

[13]  Donald B. Johnson,et al.  Efficient Algorithms for Shortest Paths in Sparse Networks , 1977, J. ACM.

[14]  Louise Guthrie,et al.  Lexical Disambiguation using Simulated Annealing , 1992, COLING.

[15]  Eneko Agirre,et al.  Building Accurate Semantic Taxonomies from Monolingual MRDs , 1998, COLING-ACL.

[16]  Deniz Yuret,et al.  Discovery of linguistic relations using lexical attraction , 1998, ArXiv.

[17]  Ben Shneiderman,et al.  Structural analysis of hypertexts: identifying hierarchies and useful metrics , 1992, TOIS.

[18]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .

[19]  Walter Daelemans,et al.  GAMBL, genetic algorithm optimization of memory-based WSD , 2004, SENSEVAL@ACL.

[20]  Ted Pedersen,et al.  Maximizing Semantic Relatedness to Perform Word Sense Disambiguation , 2005 .

[21]  Adrian Novischi Combining Methods for Word Sense Disambiguation of WordNet Glosses , 2004, FLAIRS Conference.

[22]  Jihie Kim,et al.  Learning to Detect Conversation Focus of Threaded Discussions , 2006, NAACL.

[23]  David Hawking,et al.  Predicting Fame and Fortune: PageRank or Indegree? , 2003 .

[24]  George A. Miller,et al.  A Semantic Concordance , 1993, HLT.

[25]  Sivaji Bandyopadhyay,et al.  JU-SKNSB: Extended WordNet Based WSD on the English All-Words Task at SemEval-1 , 2007, SemEval@ACL.

[26]  Andrew Y. Ng,et al.  Learning random walk models for inducing word dependency distributions , 2004, ICML.

[27]  VelardiPaola,et al.  Learning Domain Ontologies from Document Warehouses and Dedicated Web Sites , 2004 .

[28]  Regina Barzilay,et al.  Using Lexical Chains for Text Summarization , 1997 .

[29]  Stephen P. Borgatti Identifying sets of key players in a network , 2003, IEMC '03 Proceedings. Managing Technologically Driven Organizations: The Human Side of Innovation and Change (IEEE Cat. No.03CH37502).

[30]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[31]  Roberto Navigli,et al.  Semi-Automatic Extension of Large-Scale Linguistic Knowledge Bases , 2005, FLAIRS.

[32]  Martin Chodorow,et al.  Extracting Semantic Hierarchies from a Large On-Line Dictionary , 1985, ACL.

[33]  Dorothea Heiss-Czedik,et al.  An Introduction to Genetic Algorithms. , 1997, Artificial Life.

[34]  Martin van den Berg,et al.  Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery , 1999, Comput. Networks.

[35]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[36]  Dragomir R. Radev,et al.  Using Random Walks for Question-focused Sentence Retrieval , 2005, HLT.

[37]  Vagelis Hristidis,et al.  ObjectRank: Authority-Based Keyword Search in Databases , 2004, VLDB.

[38]  Graeme Hirst,et al.  Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[39]  Rada Mihalcea,et al.  PageRank on Semantic Networks, with Application to Word Sense Disambiguation , 2004, COLING.

[40]  Gert Sabidussi,et al.  The centrality index of a graph , 1966 .

[41]  D. Id,et al.  Evaluating sense disambiguation across diverse parameter spaces , 2002 .

[42]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[43]  Longman Longman Language Activator , 1993 .

[44]  Martha Palmer,et al.  SemEval-2007 Task-17: English Lexical Sample, SRL and All Words , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[45]  Eneko Agirre,et al.  Using the Multilingual Central Repository for Graph-Based Word Sense Disambiguation , 2008, LREC.

[46]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[47]  P. Bonacich Factoring and weighting approaches to status scores and clique identification , 1972 .

[48]  Johan Bollen,et al.  MESUR: usage-based metrics of scholarly impact , 2007, JCDL '07.

[49]  Hwee Tou Ng,et al.  Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-Based Approach , 1996, ACL.

[50]  Dong-Hong Ji,et al.  Word Sense Disambiguation Using Label Propagation Based Semi-Supervised Learning , 2005, ACL.

[51]  Julie Weeds,et al.  Finding Predominant Word Senses in Untagged Text , 2004, ACL.

[52]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[53]  Martha Palmer,et al.  The English all-words task , 2004, SENSEVAL@ACL.

[54]  Hwee Tou Ng,et al.  Getting Serious about Word Sense Disambiguation , 2002 .

[55]  Patrick Pantel,et al.  Discovering word senses from text , 2002, KDD.

[56]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[57]  Piek Vossen,et al.  EuroWordNet: A multilingual database with lexical semantic networks , 1998, Springer Netherlands.

[58]  David Yarowsky,et al.  One Sense per Collocation , 1993, HLT.

[59]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[60]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[61]  David Yarowsky,et al.  One Sense Per Discourse , 1992, HLT.

[62]  U. Brandes A faster algorithm for betweenness centrality , 2001 .

[63]  Christopher Stokoe Differentiating Homonymy and Polysemy in Information Retrieval , 2005, HLT/EMNLP.

[64]  Jean Véronis,et al.  HyperLex: lexical cartography for information retrieval , 2004, Comput. Speech Lang..

[65]  Carlo Strapparava,et al.  Pattern abstraction and term similarity for Word Sense Disambiguation: IRST at Senseval-3 , 2004 .

[66]  Paola Velardi,et al.  Learning Domain Ontologies from Document Warehouses and Dedicated Web Sites , 2004, CL.

[67]  Daphne Koller,et al.  Word-Sense Disambiguation for Machine Translation , 2005, HLT.

[68]  F. Harary,et al.  Eccentricity and centrality in networks , 1995 .

[69]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[70]  Ted Pedersen,et al.  Extended Gloss Overlaps as a Measure of Semantic Relatedness , 2003, IJCAI.

[71]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[72]  Kathleen McKeown,et al.  Improving Word Sense Disambiguation in Lexical Chaining , 2003, IJCAI.

[73]  Mirella Lapata,et al.  Graph Connectivity Measures for Unsupervised Word Sense Disambiguation , 2007, IJCAI.

[74]  Daniel Jurafsky,et al.  Semantic Taxonomy Induction from Heterogenous Evidence , 2006, ACL.

[75]  Andrea Esuli,et al.  PageRanking WordNet Synsets: An Application to Opinion Mining , 2007, ACL.

[76]  Eneko Agirre,et al.  Word Sense Disambiguation: Algorithms and Applications , 2007 .