Corpus-Derived First, Second and Third-Order Word Affinities

A number of corpus-based extraction techniques have been successfully implemented which derive lists of similar words, based on some definition of the context in which they are found, from a corpus. We present here the results of affining such a list in order to extract semantic axes expressing nuances of a word's meaning. These semantic axes represent corpus-based meaning distinctions that are based on the word's usage in the corpus.

[1]  Gregory Grefenstette,et al.  Explorations in automatic thesaurus discovery , 1994 .

[2]  Penelope Sibun,et al.  A Practical Part-of-Speech Tagger , 1992, ANLP.

[3]  Yaacov Choueka,et al.  Looking for Needles in a Haystack or Locating Interesting Collocational Expressions in Large Textual Databases , 1988, RIAO Conference.

[4]  Donald Hindle,et al.  Acquiring Disambiguation Rules from Text , 1989, ACL.

[5]  David Yarowsky,et al.  Word-Sense Disambiguation Using Statistical Models of Roget’s Categories Trained on Large Corpora , 2010, COLING.

[6]  H. Charles Romesburg,et al.  Cluster analysis for researchers , 1984 .

[7]  Karen Sparck Jones Automatic keyword classification for information retrieval , 1971 .

[8]  Martin Phillips,et al.  Aspects of text structure , 1985 .

[9]  Christopher D. Manning Automatic Acquisition of a Large Sub Categorization Dictionary From Corpora , 1993, ACL.

[10]  Umberto Eco,et al.  Semiotics and the philosophy of language , 1985, Advances in semiotics.

[11]  Carl de Marcken,et al.  Parsing the LOB Corpus , 1990, ACL.

[12]  Naftali Tishby,et al.  Distributional Similarity, Phase Transitions and Hierarchical Clustering , 1992 .

[13]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[14]  Donald Hindle,et al.  Noun Classification From Predicate-Argument Structures , 1990, ACL.

[15]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[16]  Frank Smadja,et al.  Retrieving Collocations from Text: Xtract , 1993, CL.

[17]  J. Trier Der deutsche Wortschatz im Sinnbezirk des Verstandes : die Geschichte eines Sprachlichen Feldes , 1931 .

[18]  Richard A. Harshman,et al.  Indexing by latent semantic indexing , 1990 .

[19]  Gregory Grefenstette Automatic Thesaurus Generation from Raw Text using Knowledge-Poor Techniques , 1993 .