Automated Discovery of WordNet Relations

The WordNet lexical database is now quite large and offers broad coverage of general lexical relations in English. As is evident in this volume, WordNet has been employed as a resource for many applications in natural language processing (NLP) and information retrieval (IR). However, many potentially useful lexical relations are currently missing from WordNet. Some of these relations, while useful for NLP and IR applications, are not necessarily appropriate for a general, domain-independent lexical database. For example, WordNet’s coverage of proper nouns is rather sparse, but proper nouns are often very important in application tasks. The standard way lexicographers find new relations is to look through huge lists of concordance lines. However, culling through long lists of concordance lines can be a rather daunting task (Church and Hanks, 1990), so a method that picks out those lines that are very likely to hold relations of interest should be an improvement over more traditional techniques. This chapter describes a method for the automatic discovery of WordNetstyle lexico-semantic relations by searching for corresponding lexico-syntactic patterns in large text collections. Large text corpora are now widely available, and can be viewed as vast resources from which to mine lexical, syntactic, and semantic information. This idea is reminiscent of what is known as “data mining” in the artificial intelligence literature (Fayyad and Uthurusamy, 1996), however, in this case the ore is raw text rather than tables of numerical data. The Lexico-Syntactic Pattern Extraction (LSPE) method is meant to be useful as an automated or semi-automated aid for lexicographers and builders of domain-dependent knowledge-bases. The LSPE technique is light-weight; it does not require a knowledge base or complex interpretation modules in order to suggest new WordNet relations.

[1]  Simonetta Montemagni,et al.  Structural Patterns vs. String Patterns for Extracting Semantic Information from Dictionaries , 1992, COLING.

[2]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[3]  Yorick Wilks,et al.  Providing machine tractable dictionary tools , 1990, Machine Translation.

[4]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[5]  Yael Ravin,et al.  Disamibiguating and Interpreting Verb Definitions , 1990, ACL.

[6]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[7]  Christopher D. Manning Automatic Acquisition of a Large Sub Categorization Dictionary From Corpora , 1993, ACL.

[8]  Hinrich Schütze,et al.  Word Space , 1992, NIPS.

[9]  Gregory Grefenstette,et al.  Explorations in automatic thesaurus discovery , 1994 .

[10]  Lisa F. Rau,et al.  SCISOR: extracting information from on-line news , 1990, CACM.

[11]  P. Resnik Selection and information: a class-based approach to lexical relationships , 1993 .

[12]  Martha W. Evens,et al.  Semantically Significant Patterns in Dictionary Definitions , 1986, ACL.

[13]  Martin Chodorow,et al.  Extracting Semantic Hierarchies from a Large On-Line Dictionary , 1985, ACL.

[14]  Roberto Basili,et al.  A Shallow Syntactic Analyser to Extract Word Associations from Corpora , 1992 .

[15]  Jan O. Pedersen,et al.  An object-oriented architecture for text retrieval , 1991, RIAO.

[16]  Hinrich Schütze,et al.  Customizing a Lexicon to Better Suit a Computational Task , 1996 .

[17]  Kathleen McKeown,et al.  Automatically Extracting and Representing Collocations for Language Generation , 1990, ACL.

[18]  Julian Kupiec,et al.  MURAX: a robust linguistic approach for question answering using an on-line encyclopedia , 1993, SIGIR.

[19]  Donald Hindle,et al.  Noun Classification From Predicate-Argument Structures , 1990, ACL.

[20]  Penelope Sibun,et al.  A Practical Part-of-Speech Tagger , 1992, ANLP.

[21]  Makoto Nagao,et al.  Extraction of Semantic Information from an Ordinary English Dictionary and its Evaluation , 1988, COLING.

[22]  Eric Brill,et al.  Unsupervised Learning of Disambiguation Rules for Part of Speech Tagging , 1995, VLC@ACL.

[23]  Paul S. Jacobs,et al.  Acquiring Lexical Knowledge from Text: A Case Study , 1988, AAAI.

[24]  Lucy Vanderwende Ambiguity in the Acquisition of Lexical Information , 1995, ArXiv.

[25]  Nicoletta Calzolari,et al.  Acquisition of Lexical Information from a Large Textual Italian Corpus , 1990, COLING.

[26]  Hiyan Alshawi,et al.  Processing Dictionary Definitions with Phrasal Pattern Hierarchies , 1987, CL.

[27]  Yorick Wilks,et al.  Is there content in empty heads? , 1990, COLING.

[28]  Karen Jensen,et al.  Disambiguating Prepositional Phrase Attachments by Using On-Line Dictionary Definitions , 1987, Comput. Linguistics.

[29]  Paola Velardi,et al.  Computer Aided Interpretation of Lexical Coocurrences , 1989, ACL.

[30]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[31]  SmadjaFrank Retrieving collocations from text , 1993 .

[32]  Michael R. Brent,et al.  From Grammar to Lexicon: Unsupervised Learning of Lexical Syntax , 1993, Comput. Linguistics.