论文信息 - Automatic Association of Web Directories with Word Senses

Automatic Association of Web Directories with Word Senses

We describe an algorithm that combines lexical information (from WordNet 1.7) with Web directories (from the Open Directory Project) to associate word senses with such directories. Such associations can be used as rich characterizations to acquire sense-tagged corpora automatically, cluster topically related senses, and detect sense specializations. The algorithm is evaluated for the 29 nouns (147 senses) used in the Senseval 2 competition, obtaining 148 (word sense, Web directory) associations covering 88 of the domain-specific word senses in the test data with 86 accuracy. The richness of Web directories as sense characterizations is evaluated in a supervised word sense disambiguation task using the Senseval 2 test suite. The results indicate that, when the directory/word sense association is correct, the samples automatically acquired from the Web directories are nearly as valid for training as the original Senseval 2 training instances. The results support our hypothesis that Web directories are a rich source of lexical information: cleaner, more reliable, and more structured than the full Web as a corpus.

Julio Gonzalo | M. Felisa Verdejo | Celina Santamaría

[1] Bernardo Magnini,et al. Integrating Subject Field Codes into WordNet , 2000, LREC.

[2] Mark Sanderson,et al. Retrieving descriptive phrases from large amounts of free text , 2000, CIKM '00.

[3] Ted Pedersen. Machine Learning with Lexical Features: The Duluth Approach to SENSEVAL-2 , 2001, SENSEVAL@ACL.

[4] Mike Thelwall,et al. Text characteristics of English language university Web sites , 2005, J. Assoc. Inf. Sci. Technol..

[5] Adam Kilgarriff,et al. English Lexical Sample Task Description , 2001, *SEMEVAL.

[6] Xiaoyi Ma,et al. BITS: a method for bilingual text search over the Web , 1999, MTSUMMIT.

[7] George A. Miller,et al. Introduction to WordNet: An On-line Lexical Database , 1990 .

[8] Eneko Agirre,et al. Exploring Automatic Word Sense Disambiguation with Decision Lists and the Web , 2000, SAIC@COLING.

[9] Noah A. Smith,et al. The Web as a Parallel Corpus , 2003, CL.

[10] Rada Mihalcea,et al. An Automatic Method for Generating Sense Tagged Corpora , 1999, AAAI/IAAI.

[11] Rada Mihalcea,et al. A Method for Word Sense Disambiguation of Unrestricted Text , 1999, ACL.