Matrix Plane Model: A Novel Measure of Word Co-occurrence and Application on Semantic Relatedness

Word co-occurrence measures co-occurring strength between words in texts. Most of the previous measures use a pre-decided context window to define co-occurrence of words. This size is decided from experience, and it is fixed during the whole process of measure. However, this is not ideal because appropriate window size can be different even in two adjacent sentences of a text. This paper provides a novel model called Matrix Plane Model (MPM), which can capture the best-fit window size dynamically and automatically. Also, we set up an experiment to compare MPM with some widely used measures by applying to semantic relatedness measures. The results show that our approach makes significant improvement in performance of semantic relatedness measures.

[1]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[2]  Hinrich Schütze,et al.  A Cooccurrence-Based Thesaurus and Two Applications to Information Retrieval , 1994, Inf. Process. Manag..

[3]  Ido Dagan,et al.  Similarity-based methods for word sense disambiguation , 1997 .

[4]  Ido Dagan,et al.  Contextual word similarity and estimation from sparse data , 1995, Comput. Speech Lang..

[5]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[6]  Takashi Yukawa,et al.  Constructing and Examining Personalized Cooccurrence-based Thesauri on Web Pages , 2003, WWW.

[7]  Adam Kilgarriff,et al.  of the European Chapter of the Association for Computational Linguistics , 2006 .

[8]  John B. Goodenough,et al.  Contextual correlates of synonymy , 1965, CACM.

[9]  Donald Hindle,et al.  Noun Classification From Predicate-Argument Structures , 1990, ACL.

[10]  Fred Popowich,et al.  Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics , 2009 .

[11]  Stefan Bordag,et al.  A Comparison of Co-occurrence and Similarity Measures as Simulations of Context , 2008, CICLing.

[12]  Lillian Lee,et al.  Measures of Distributional Similarity , 1999, ACL.

[13]  J. Bullinaria,et al.  Extracting semantic representations from word co-occurrence statistics: A computational study , 2007, Behavior research methods.

[14]  J. R. Firth,et al.  Studies in Linguistic Analysis. , 1974 .

[15]  Graeme Hirst,et al.  Distributional Measures as Proxies for Semantic Relatedness , 2012, ArXiv.

[16]  F. D. Garber,et al.  The Quality of Training Sample Estimates of the Bhattacharyya Coefficient , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[18]  Yi Su,et al.  TREC-9 CLIR Experiments at MSRCN , 2000, TREC.

[19]  Caroline Willners,et al.  Statistics for sentential co-occurrence , 2001 .

[20]  Evgeniy Gabrilovich,et al.  A word at a time: computing word relatedness using temporal semantic analysis , 2011, WWW.

[21]  Mitsuru Ishizuka,et al.  Keyword extraction from a single document using word co-occurrence statistical information , 2004, Int. J. Artif. Intell. Tools.

[22]  Eneko Agirre,et al.  A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches , 2009, NAACL.