A supervised ranking approach for detecting relationally similar word pairs

The similarity between the semantic relations that exist between two word pairs is defined as their relational similarity. For example, the semantic relation, is a large holds between the words in the word pair (lion, cat) and (ostrich, bird), because lion is a large cat, and ostrich is the largest living bird on earth. Consequently, the two word pairs, (lion, cat) and (ostrich, bird), are considered to be relationally similar. A high degree of relational similarity can be observed between analogous pairs of words. Measuring the relational similarity between word pairs is important in numerous natural language processing tasks such as solving word analogy questions, classifying noun-modifier relations and disambiguating word senses. We propose a supervised ranking-based method to detect relationally similar word pairs to a given word pair using information retrieved from a Web search engine. First, each pair of words is represented by a vector of automatically extracted lexical patterns. Then a ranking Support Vector Machine is trained to recognize word pairs with similar semantic relations to a given word pair. To train and evaluate the proposed method, we use a benchmark dataset that contains 374 SAT multiple-choice word-analogy questions. To represent the relations that exist between two word pairs, we experiment with 11 different feature functions, including both symmetric and asymmetric feature functions. Our experimental results show that the proposed ranking-based approach outperforms several previously proposed relational similarity measures on this benchmark dataset, achieving an SAT score of 46.9.

[1]  Jianyong Wang,et al.  Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[2]  Peter D. Turney Expressing Implicit Semantic Relations without Supervision , 2006, ACL.

[3]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[4]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[5]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[6]  Tony Veale,et al.  WordNet Sits the S.A.T. - A Knowledge-Based Approach to Lexical Analogy , 2004, ECAI.

[7]  Danushka Bollegala,et al.  WWW sits the SAT: Measuring Relational Similarity on the Web , 2008, ECAI.

[8]  Danushka Bollegala,et al.  Measuring the similarity between implicit semantic relations from the web , 2009, WWW '09.

[9]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[10]  満 石塚,et al.  WWW sits the SAT- Measuring Relational Similarity on the Web , 2008 .

[11]  Peter D. Turney Similarity of Semantic Relations , 2006, CL.

[12]  Michael L. Littman,et al.  Corpus-based Learning of Analogies and Semantic Relations , 2005, Machine Learning.

[13]  Jeffrey P. Bigham,et al.  Combining Independent Modules to Solve Multiple-choice Synonym and Analogy Problems , 2003, ArXiv.