Predicting Strong Associations on the Basis of Corpus Data

Current approaches to the prediction of associations rely on just one type of information, generally taking the form of either word space models or collocation measures. At the moment, it is an open question how these approaches compare to one another. In this paper, we will investigate the performance of these two types of models and that of a new approach based on compounding. The best single predictor is the log-likelihood ratio, followed closely by the document-based word space model. We will show, however, that an ensemble method that combines these two best approaches with the compounding algorithm achieves an increase in performance of almost 30% over the current state of the art.

[1]  Peter W. Foltz,et al.  Latent semantic analysis for text-based research , 1996 .

[2]  J. Aitchison Words in the Mind: An Introduction to the Mental Lexicon , 1987 .

[3]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[4]  J. Aitchison Words in the mind , 1994 .

[5]  Sabine Schulte im Walde,et al.  Identifying Semantic Relations and Functional Properties of Human Verb Associations , 2005, HLT/EMNLP.

[6]  J. Bullinaria,et al.  Extracting semantic representations from word co-occurrence statistics: A computational study , 2007, Behavior research methods.

[7]  Theodore Alexandrov,et al.  Does Latent Semantic Analysis Reflect Human Associations ? , 2008 .

[8]  Magnus Sahlgren,et al.  The Word-Space Model: using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces , 2006 .

[9]  W. Lowe,et al.  The Direct Route: Mediated Priming in Semantic Space , 2000 .

[10]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[11]  Gertjan van Noord,et al.  At Last Parsing Is Now Operational , 2006, JEPTALNRECITAL.

[12]  Yves Peirsman,et al.  Size matters: tight and loose context definitions in English word space models , 2008 .

[13]  Ed H. Chi,et al.  Using information scent to model user information needs and actions and the Web , 2001, CHI.

[14]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[15]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[16]  James Curran,et al.  Ensemble Methods for Automatic Thesaurus Extraction , 2002, EMNLP.

[17]  Gert Storms,et al.  Word associations: Norms for 1,424 Dutch words in a continuous task , 2008, Behavior research methods.

[18]  Curt Burgess,et al.  Explorations in context space: Words, sentences, discourse , 1998 .

[19]  Marie Louise Elizabeth van der Plas,et al.  Automatic lexico-semantic acquisition for question answering , 2008 .

[20]  T. Fry Size matters. , 2007, Community practitioner : the journal of the Community Practitioners' & Health Visitors' Association.

[21]  Tom Michael Mitchell,et al.  Predicting Human Brain Activity Associated with the Meanings of Nouns , 2008, Science.

[22]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[23]  Erkki Sutinen,et al.  Automatic Essay Grading with Probabilistic Latent Semantic Analysis , 2005 .

[24]  Danny Jones,et al.  Words in the mind: An introduction to the mental lexicon , 2004, Machine Translation.

[25]  Reinhard Rapp,et al.  Free Word Associations Correspond to Contiguities Between Words in Texts* , 2005, J. Quant. Linguistics.

[26]  Mirella Lapata,et al.  Dependency-Based Construction of Semantic Space Models , 2007, CL.