Improvements in Automatic Thesaurus Extraction

The use of semantic resources is common in modern NLP systems, but methods to extract lexical semantics have only recently begun to perform well enough for practical use. We evaluate existing and new similarity metrics for thesaurus extraction, and experiment with the trade-off between extraction performance and efficiency. We propose an approximation algorithm, based on canonical attributes and coarse- and fine-grained matching, that reduces the time complexity and execution time of thesaurus extraction with only a marginal performance penalty.

[1]  Carolyn J. Crouch,et al.  A cluster-based approach to thesaurus construction , 1988, SIGIR '88.

[2]  Haruo Kimoto,et al.  Construction of a dynamic Thesaurus and its use for associated information retrieval , 1989, SIGIR '90.

[3]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[4]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[5]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[6]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[7]  Gregory Grefenstette,et al.  Explorations in automatic thesaurus discovery , 1994 .

[8]  Gerda Ruge,et al.  Automatic Detection of Thesaurus relations for Information Retrieval Applications , 1997, Foundations of Computer Science: Potential - Theory - Cognition.

[9]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[10]  Andrew McCallum,et al.  Distributional clustering of words for text classification , 1998, SIGIR '98.

[11]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[12]  W. Bruce Croft,et al.  Deriving concept hierarchies from text , 1999, SIGIR '99.

[13]  Sharon A. Caraballo Automatic construction of a hypernym-labeled noun hierarchy from text , 1999, ACL.

[14]  Scott A. McDonald,et al.  Environmental Determinants of Lexical Processing Effort , 2000 .

[15]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[16]  John A. Carroll,et al.  Robust, applied morphological generation , 2000, INLG.

[17]  Stephen Clark,et al.  Class-based probability estimation using a semantic hierarchy , 2001, HTL 2001.

[18]  Darren Pearce,et al.  Synonymy in collocation extraction , 2001 .

[19]  James R. Curran,et al.  Scaling Context Space , 2002, ACL.

[20]  B. V. Verghese,et al.  Thesaurus of English Words and Phrases , 2002 .

[21]  P. Kantor Foundations of Statistical Natural Language Processing , 2001, Information Retrieval.