Measuring Sentence Similarity from Both the Perspectives of Commonalities and Differences

Similarity between two sentences can be determined by either comparing their commonalities or their differences. Commonalities, which reflect similarity judgment, connect the two sentences while differences, which reflect dissimilarity judgment, represent the unique way of self-identification. Although both of them are essential in determining sentence similarity, however, the existing methods only focus on single perspective, mostly the perspective of commonalities. This paper presents a method which calculates the sentence similarity from multiple perspectives by taking both the commonalities and differences into consideration. The experimental result on a standard data set shows that the proposed method outperforms the baseline, which is the existing most outstanding single perspective measure, with statistically significant improvement.

[1]  Rada Mihalcea,et al.  Measuring the Semantic Similarity of Texts , 2005, EMSEE@ACL.

[2]  Dragos Stefan Munteanu,et al.  ParaEval: Using Paraphrases to Evaluate Summaries Automatically , 2006, NAACL.

[3]  Jin Feng,et al.  Sentence Similarity based on Relevance , 2008 .

[4]  Xiaoying Liu,et al.  Sentence Similarity based on Dynamic Time Warping , 2007, International Conference on Semantic Computing (ICSC 2007).

[5]  Masrah Azrifah Azmi Murad,et al.  Word Sense Disambiguation-based Sentence Similarity , 2010, COLING.

[6]  Regina Barzilay,et al.  Paraphrasing for Automatic Evaluation , 2006, NAACL.

[7]  A. Tversky Features of Similarity , 1977 .

[8]  Xiao-Ying Liu,et al.  Measuring Semantic Similarity in Wordnet , 2007, 2007 International Conference on Machine Learning and Cybernetics.

[9]  Zuhair Bandar,et al.  Sentence similarity based on semantic nets and corpus statistics , 2006, IEEE Transactions on Knowledge and Data Engineering.

[10]  Jun Wang,et al.  Measuring sentence similarity from different aspects , 2009, 2009 International Conference on Machine Learning and Cybernetics.

[11]  Suresh Manandhar,et al.  An Analysis of Clarification Dialogue for Question Answering , 2003, NAACL.

[12]  John Sinclair,et al.  Collins Cobuild English dictionary for advanced learners , 2001 .

[13]  Mirella Lapata,et al.  Automatic Evaluation of Text Coherence: Models and Representations , 2005, IJCAI.

[14]  Carlo Strapparava,et al.  Corpus-based and Knowledge-based Measures of Text Semantic Similarity , 2006, AAAI.

[15]  Jean-Michel Jolion,et al.  Feature Similarity , 2001, Principles of Visual Information Retrieval.

[16]  Qinbao Song,et al.  A new text feature extraction model and its application in document copy detection , 2003, Proceedings of the 2003 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.03EX693).

[17]  Wataru Kameyama,et al.  Khmer POS Tagger: A Transformation-based Approach with Hybrid Unknown Word Handling , 2007 .

[18]  Zuhair Bandar,et al.  A Comparative Study of Two Short Text Semantic Similarity Measures , 2008, KES-AMSTA.

[19]  Chee Wee Leong,et al.  Exploiting Wikipedia for Directional Inferential Text Similarity , 2008, Fifth International Conference on Information Technology: New Generations (itng 2008).

[20]  Xiaohua Hu,et al.  The Evaluation of Sentence Similarity Measures , 2008, DaWaK.

[21]  Robert L. Goldstone,et al.  Similarity Involving Attributes and Relations: Judgments of Similarity and Difference Are Not Inverses , 1990 .

[22]  John B. Goodenough,et al.  Contextual correlates of synonymy , 1965, CACM.

[23]  David McLean,et al.  An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources , 2003, IEEE Trans. Knowl. Data Eng..

[24]  Diana Inkpen,et al.  Semantic text similarity using corpus-based word similarity and string similarity , 2008, ACM Trans. Knowl. Discov. Data.

[25]  Jun Wen,et al.  Text Categorization Based on a Similarity Approach , 2007 .