Measuring the Degree of Synonymy between Words Using Relational Similarity between Word Pairs as a Proxy

Two types of similarities between words have been studied in the natural language processing community: synonymy and relational similarity. A high degree of similarity exist between synonymous words. On the other hand, a high degree of relational similarity exists between analogous word pairs. We present and empirically test a hypothesis that links these two types of similarities. Specifically, we propose a method to measure the degree of synonymy between two words using relational similarity between word pairs as a proxy. Given two words, first, we represent the semantic relations that hold between those words using lexical patterns. We use a sequential pattern clustering algorithm to identify different lexical patterns that represent the same semantic relation. Second, we compute the degree of synonymy between two words using an inter-cluster covariance matrix. We compare the proposed method for measuring the degree of synonymy against previously proposed methods on the Miller-Charles dataset and the WordSimilarity-353 dataset. Our proposed method outperforms all existing Web-based similarity measures, achieving a statistically significant Pearson correlation coefficient of 0.867 on the Miller-Charles dataset. key words: synonymy, attributional similarity, relational similarity, Miller-Charles dataset, WordSimilarity-353 dataset

[1]  Danushka Bollegala,et al.  Measuring the similarity between implicit semantic relations from the web , 2009, WWW '09.

[2]  Ryutaro Ichise,et al.  Toward Simulating the Human Way of Comparing Concepts , 2011, IEICE Trans. Inf. Syst..

[3]  Mario Jarmasz,et al.  Roget's Thesaurus as a Lexical Resource for Natural Language Processing , 2012, ArXiv.

[4]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[5]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[6]  Hsin-Hsi Chen,et al.  Novel Association Measures Using Web Search with Double Checking , 2006, ACL.

[7]  Boi Faltings,et al.  OSS: A Semantic Similarity Function based on Hierarchical Ontologies , 2007, IJCAI.

[8]  Nuno Seco,et al.  Design, Implementation and Evaluation of a New Semantic Similarity Metric Combining Features and Intrinsic Information Content , 2008, OTM Conferences.

[9]  Simone Paolo Ponzetto,et al.  WikiRelate! Computing Semantic Relatedness Using Wikipedia , 2006, AAAI.

[10]  Jianyong Wang,et al.  Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[11]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[12]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[13]  Dragos Stefan Munteanu,et al.  ParaEval: Using Paraphrases to Evaluate Summaries Automatically , 2006, NAACL.

[14]  Danushka Bollegala,et al.  Measuring semantic similarity between words using web search engines , 2007, WWW '07.

[15]  Peter D. Turney Expressing Implicit Semantic Relations without Supervision , 2006, ACL.

[16]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[17]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[18]  Preslav Nakov,et al.  Combining Relational and Attributional Similarity for Semantic Relation Classification , 2011, RANLP.

[19]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[20]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[21]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[22]  David McLean,et al.  An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources , 2003, IEEE Trans. Knowl. Data Eng..

[23]  Graeme Hirst,et al.  Lexical chains as representations of context for the detection and correction of malapropisms , 1995 .

[24]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[25]  James Curran,et al.  Ensemble Methods for Automatic Thesaurus Extraction , 2002, EMNLP.

[26]  Peter D. Turney Similarity of Semantic Relations , 2006, CL.

[27]  Mehran Sahami,et al.  A web-based kernel function for measuring the similarity of short text snippets , 2006, WWW '06.

[28]  Martin Chodorow,et al.  Combining local context and wordnet similarity for word sense identification , 1998 .