论文信息 - Size matters: tight and loose context definitions in English word space models

Size matters: tight and loose context definitions in English word space models

Word Space Models use distributional similarity between two words as a measure of their semantic similarity or relatedness. This distributional similarity, however, is influenced by the type of context the models take into account. Context definitions range on a continuum from tight to loose, depending on the size of the context window around the target or the order of the context words that are considered. This paper investigates whether two general ways of loosening the context definition — by extending the context size from one to ten words, and by taking into account secondorder context words — produce equivalent results. In particular, we will evaluate the performance of the models in terms of their ability (1) to discover semantic word classes and (2) to mirror human associations.

Yves Peirsman | Kris Heylen | Dirk Geeraerts

[1] Ido Dagan,et al. Contextual Word Similarity and Estimation from Sparse Data , 1993, ACL.

[2] Alessandro Lenci,et al. ISA meets Lara: An incremental word space model for cognitively plausible simulations of semantic learning , 2007, ACL 2007.

[3] Joseph P. Levy,et al. Learning Lexical Properties from Word Usage Patterns: Which Context Words Should be Used? , 2000, NCPW.

[4] David Yarowsky,et al. Discrimination Decisions for 100,000-Dimensional Spaces , 1995 .

[5] Yves Peirsman,et al. Word Space Models of Semantic Similarity and Relatedness , 2008 .

[6] J. Bullinaria,et al. Extracting semantic representations from word co-occurrence statistics: A computational study , 2007, Behavior research methods.

[7] George Karypis,et al. CLUTO - A Clustering Toolkit , 2002 .

[8] T. Landauer,et al. A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[9] Hinrich Schütze,et al. Automatic Word Sense Discrimination , 1998, Comput. Linguistics.