论文信息 - Feature Vector Quality and Distributional Similarity

Feature Vector Quality and Distributional Similarity

We suggest a new goal and evaluation criterion for word similarity measures. The new criterion - meaning-entailing substitutability - fits the needs of semantic-oriented NLP applications and can be evaluated directly (independent of an application) at a good level of human agreement. Motivated by this semantic criterion we analyze the empirical quality of distributional word feature vectors and its impact on word similarity results, proposing an objective measure for evaluating feature vector quality. Finally, a novel feature weighting and selection function is presented, which yields superior feature vectors and better word similarity performance.

Ido Dagan | Maayan Zhitomirsky-Geffet | Ido Dagan | M. Zhitomirsky-Geffet

[1] Gerda Ruge,et al. Experiments on Linguistically-Based Term Associations , 1992, Inf. Process. Manag..

[2] Lillian Lee,et al. Similarity-Based Approaches to Natural Language Processing , 1997, ArXiv.

[3] Patrick Pantel,et al. Discovery of inference rules for question-answering , 2001, Natural Language Engineering.

[4] Kenneth Ward Church,et al. Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[5] Gregory Grefenstette,et al. Explorations in automatic thesaurus discovery , 1994 .

[6] Regina Barzilay,et al. Extracting Paraphrases from a Parallel Corpus , 2001, ACL.

[7] Ido Dagan,et al. Similarity-Based Models of Word Cooccurrence Probabilities , 1998, Machine Learning.

[8] Dekang Lin,et al. Principle-Based Parsing Without Overgeneration , 1993, ACL.

[9] Donald Hindle,et al. Noun Classification From Predicate-Argument Structures , 1990, ACL.

[10] Dekang Lin,et al. Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[11] Zellig S. Harris,et al. Mathematical structures of language , 1968, Interscience tracts in pure and applied mathematics.

[12] Ido Dagan,et al. PROBABILISTIC TEXTUAL ENTAILMENT: GENERIC APPLIED MODELING OF LANGUAGE VARIABILITY , 2004 .

[13] David J. Weir,et al. A General Framework for Distributional Similarity , 2003, EMNLP.

[14] Naftali Tishby,et al. Distributional Clustering of English Words , 1993, ACL.