论文信息 - Matrix Plane Model: A Novel Measure of Word Co-occurrence and Application on Semantic Relatedness

Matrix Plane Model: A Novel Measure of Word Co-occurrence and Application on Semantic Relatedness

Word co-occurrence measures co-occurring strength between words in texts. Most of the previous measures use a pre-decided context window to define co-occurrence of words. This size is decided from experience, and it is fixed during the whole process of measure. However, this is not ideal because appropriate window size can be different even in two adjacent sentences of a text. This paper provides a novel model called Matrix Plane Model (MPM), which can capture the best-fit window size dynamically and automatically. Also, we set up an experiment to compare MPM with some widely used measures by applying to semantic relatedness measures. The results show that our approach makes significant improvement in performance of semantic relatedness measures.

Yukio Ohsawa | Ji Qi

[1] Ted Dunning,et al. Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[2] Hinrich Schütze,et al. A Cooccurrence-Based Thesaurus and Two Applications to Information Retrieval , 1994, Inf. Process. Manag..

[3] Ido Dagan,et al. Similarity-based methods for word sense disambiguation , 1997 .

[4] Ido Dagan,et al. Contextual word similarity and estimation from sparse data , 1995, Comput. Speech Lang..

[5] Naftali Tishby,et al. Distributional Clustering of English Words , 1993, ACL.

[6] Takashi Yukawa,et al. Constructing and Examining Personalized Cooccurrence-based Thesauri on Web Pages , 2003, WWW.

[7] Adam Kilgarriff,et al. of the European Chapter of the Association for Computational Linguistics , 2006 .

[8] John B. Goodenough,et al. Contextual correlates of synonymy , 1965, CACM.

[9] Donald Hindle,et al. Noun Classification From Predicate-Argument Structures , 1990, ACL.

[10] Fred Popowich,et al. Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics , 2009 .

[11] Stefan Bordag,et al. A Comparison of Co-occurrence and Similarity Measures as Simulations of Context , 2008, CICLing.