The Computation of Word Associations: Comparing Syntagmatic and Paradigmatic Approaches

It is shown that basic language processes such as the production of free word associations and the generation of synonyms can be simulated using statistical models that analyze the distribution of words in large text corpora. According to the law of association by contiguity, the acquisition of word associations can be explained by Hebbian learning. The free word associations as produced by subjects on presentation of single stimulus words can thus be predicted by applying first-order statistics to the frequencies of word co-occurrences as observed in texts. The generation of synonyms can also be conducted on co-occurrence data but requires second-order statistics. The reason is that synonyms rarely occur together but appear in similar lexical neighborhoods. Both approaches are systematically compared and are validated on empirical data. It turns out that for both tasks the performance of the statistical system is comparable to the performance of human subjects.

[1]  Reinhard Rapp,et al.  Automatic Identification of Word Translations from Unrelated English and German Corpora , 1999, ACL.

[2]  Eugene Charniak,et al.  Finding Parts in Very Large Corpora , 1999, ACL.

[3]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[4]  Hinrich Schütze,et al.  Ambiguity resolution in language learning - computational and cognitive models , 1997, CSLI lecture notes series.

[5]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[6]  Gregory Grefenstetti,et al.  Evaluation techniques for automatic semantic extraction: comparing syntactic and window based approaches , 1996 .

[7]  Rajeev Agarwal,et al.  Semantic feature extraction from technical texts with limited human intervention , 1995 .

[8]  Gregory Grefenstette,et al.  Explorations in automatic thesaurus discovery , 1994 .

[9]  Frank Smadja,et al.  Retrieving Collocations from Text: Xtract , 1993, CL.

[10]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[11]  R. Rapp,et al.  Freie Assoziationen und Kontiguitäten von Wörtern in Texten , 1993 .

[12]  Reinhard Rapp,et al.  Computation of Word Associations Based on Co-occurrences of Words in Large Corpora , 1993, VLC@ACL.

[13]  Gregory Grefenstette,et al.  Evaluation Techniques for Automatic Semantic Extraction: Comparing Syntactic and Window Based Approaches , 1996 .

[14]  Gerda Ruge,et al.  Experiments on Linguistically-Based Term Associations , 1992, Inf. Process. Manag..

[15]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[16]  F. D. Saussure Cours de linguistique générale , 1924 .