论文信息 - Automated Discovery of WordNet Relations

Automated Discovery of WordNet Relations

The WordNet lexical database is now quite large and offers broad coverage of general lexical relations in English. As is evident in this volume, WordNet has been employed as a resource for many applications in natural language processing (NLP) and information retrieval (IR). However, many potentially useful lexical relations are currently missing from WordNet. Some of these relations, while useful for NLP and IR applications, are not necessarily appropriate for a general, domain-independent lexical database. For example, WordNet’s coverage of proper nouns is rather sparse, but proper nouns are often very important in application tasks. The standard way lexicographers find new relations is to look through huge lists of concordance lines. However, culling through long lists of concordance lines can be a rather daunting task (Church and Hanks, 1990), so a method that picks out those lines that are very likely to hold relations of interest should be an improvement over more traditional techniques. This chapter describes a method for the automatic discovery of WordNetstyle lexico-semantic relations by searching for corresponding lexico-syntactic patterns in large text collections. Large text corpora are now widely available, and can be viewed as vast resources from which to mine lexical, syntactic, and semantic information. This idea is reminiscent of what is known as “data mining” in the artificial intelligence literature (Fayyad and Uthurusamy, 1996), however, in this case the ore is raw text rather than tables of numerical data. The Lexico-Syntactic Pattern Extraction (LSPE) method is meant to be useful as an automated or semi-automated aid for lexicographers and builders of domain-dependent knowledge-bases. The LSPE technique is light-weight; it does not require a knowledge base or complex interpretation modules in order to suggest new WordNet relations.

Marti A. Hearst

[1] Simonetta Montemagni,et al. Structural Patterns vs. String Patterns for Extracting Semantic Information from Dictionaries , 1992, COLING.

[2] Philip Resnik,et al. Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[3] Yorick Wilks,et al. Providing machine tractable dictionary tools , 1990, Machine Translation.

[4] J. Ross Quinlan,et al. Induction of Decision Trees , 1986, Machine Learning.

[5] Yael Ravin,et al. Disamibiguating and Interpreting Verb Definitions , 1990, ACL.

[6] Marti A. Hearst. Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[7] Christopher D. Manning. Automatic Acquisition of a Large Sub Categorization Dictionary From Corpora , 1993, ACL.

[8] Hinrich Schütze,et al. Word Space , 1992, NIPS.

[9] Gregory Grefenstette,et al. Explorations in automatic thesaurus discovery , 1994 .

[10] Lisa F. Rau,et al. SCISOR: extracting information from on-line news , 1990, CACM.

[11] P. Resnik. Selection and information: a class-based approach to lexical relationships , 1993 .