论文信息 - An Experimental Study of Graph Connectivity for Unsupervised Word Sense Disambiguation

An Experimental Study of Graph Connectivity for Unsupervised Word Sense Disambiguation

Word sense disambiguation (WSD), the task of identifying the intended meanings (senses) of words in context, has been a long-standing research objective for natural language processing. In this paper, we are concerned with graph-based algorithms for large-scale WSD. Under this framework, finding the right sense for a given word amounts to identifying the most ¿important¿ node among the set of graph nodes representing its senses. We introduce a graph-based WSD algorithm which has few parameters and does not require sense-annotated data for training. Using this algorithm, we investigate several measures of graph connectivity with the aim of identifying those best suited for WSD. We also examine how the chosen lexicon and its connectivity influences WSD performance. We report results on standard data sets and show that our graph-based approach performs comparably to the state of the art.

Mirella Lapata | Roberto Navigli | Mirella Lapata | Roberto Navigli

[1] Hector Garcia-Molina,et al. Combating Web Spam with TrustRank , 2004, VLDB.

[2] Mirella Lapata,et al. Ensemble Methods for Unsupervised WSD , 2006, ACL.

[3] Christian Posse,et al. PNNL: A Supervised Maximum Entropy Approach to Word Sense Disambiguation , 2007, SemEval@ACL.

[4] L. Freeman. Centrality in social networks conceptual clarification , 1978 .

[5] Ganesh Ramakrishnan,et al. Passage Scoring for Question Answering via Bayesian Inference on Lexical Relations , 2003, TREC.

[6] Dan Tufis,et al. RACAI: Meaning Affinity Models , 2007, SemEval@ACL.

[7] Xiaojun Wan,et al. Improved Affinity Graph Based Multi-Document Summarization , 2006, NAACL.

[8] Rada Mihalcea,et al. Unsupervised Large-Vocabulary Word Sense Disambiguation with Graph-based Algorithms for Sequence Data Labeling , 2005, HLT.

[9] Scott Cotton,et al. SENSEVAL-2: Overview , 2001, *SEMEVAL.

[10] Werner R. W. Scheinhardt,et al. In-Degree and PageRank of Web pages: Why do they follow similar power laws? , 2006, ArXiv.

[11] Paola Velardi,et al. Structural semantic interconnections: a knowledge-based approach to word sense disambiguation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12] Eneko Agirre,et al. Two graph-based algorithms for state-of-the-art WSD , 2006, EMNLP.

[13] Donald B. Johnson,et al. Efficient Algorithms for Shortest Paths in Sparse Networks , 1977, J. ACM.

[14] Louise Guthrie,et al. Lexical Disambiguation using Simulated Annealing , 1992, COLING.

[15] Eneko Agirre,et al. Building Accurate Semantic Taxonomies from Monolingual MRDs , 1998, COLING-ACL.

[16] Deniz Yuret,et al. Discovery of linguistic relations using lexical attraction , 1998, ArXiv.

[17] Ben Shneiderman,et al. Structural analysis of hypertexts: identifying hierarchies and useful metrics , 1992, TOIS.

[18] Zoubin Ghahramani,et al. Learning from labeled and unlabeled data with label propagation , 2002 .

[19] Walter Daelemans,et al. GAMBL, genetic algorithm optimization of memory-based WSD , 2004, SENSEVAL@ACL.

[20] Ted Pedersen,et al. Maximizing Semantic Relatedness to Perform Word Sense Disambiguation , 2005 .

[21] Adrian Novischi. Combining Methods for Word Sense Disambiguation of WordNet Glosses , 2004, FLAIRS Conference.

[22] Jihie Kim,et al. Learning to Detect Conversation Focus of Threaded Discussions , 2006, NAACL.

[23] David Hawking,et al. Predicting Fame and Fortune: PageRank or Indegree? , 2003 .

[24] George A. Miller,et al. A Semantic Concordance , 1993, HLT.

[25] Sivaji Bandyopadhyay,et al. JU-SKNSB: Extended WordNet Based WSD on the English All-Words Task at SemEval-1 , 2007, SemEval@ACL.

[26] Andrew Y. Ng,et al. Learning random walk models for inducing word dependency distributions , 2004, ICML.

[27] VelardiPaola,et al. Learning Domain Ontologies from Document Warehouses and Dedicated Web Sites , 2004 .

[28] Regina Barzilay,et al. Using Lexical Chains for Text Summarization , 1997 .

[29] Stephen P. Borgatti. Identifying sets of key players in a network , 2003, IEMC '03 Proceedings. Managing Technologically Driven Organizations: The Human Side of Innovation and Change (IEEE Cat. No.03CH37502).

[30] Dragomir R. Radev,et al. LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[31] Roberto Navigli,et al. Semi-Automatic Extension of Large-Scale Linguistic Knowledge Bases , 2005, FLAIRS.

[32] Martin Chodorow,et al. Extracting Semantic Hierarchies from a Large On-Line Dictionary , 1985, ACL.

[33] Dorothea Heiss-Czedik,et al. An Introduction to Genetic Algorithms. , 1997, Artificial Life.

[34] Martin van den Berg,et al. Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery , 1999, Comput. Networks.

[35] Stanley Wasserman,et al. Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[36] Dragomir R. Radev,et al. Using Random Walks for Question-focused Sentence Retrieval , 2005, HLT.

[37] Vagelis Hristidis,et al. ObjectRank: Authority-Based Keyword Search in Databases , 2004, VLDB.

[38] Graeme Hirst,et al. Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[39] Rada Mihalcea,et al. PageRank on Semantic Networks, with Application to Word Sense Disambiguation , 2004, COLING.

[40] Gert Sabidussi,et al. The centrality index of a graph , 1966 .

[41] D. Id,et al. Evaluating sense disambiguation across diverse parameter spaces , 2002 .