Sentence-Level Semantic Textual Similarity Using Word-Level Semantics

Estimating semantic textual similarity between sentences is indispensable for many information retrieval tasks. Traditional lexical similarity measures cannot compute the similarity beyond a trivial level. Moreover, they only can capture textual similarity, but not semantic. Researchers proposed methods using a variety of approaches. In this paper, we propose a novel method for semantic textual similarity that leverages word-level semantics to compute the sentence-level semantic similarity. We introduced two new semantic similarity measures based on word-embedding models trained on two different corpora. Apart from these, another semantic similarity measure is also introduced using the word sense comparison. The similarity score between the sentence-pair is then computed by applying a linear ranking approach to all proposed measures with their importance estimated employing a linear regression model. We conducted experiments using the SemEval Semantic Textual Similarity (STS-2017) test collections. The experimental results demonstrated that our method is effective for measuring semantic textual similarity and outperformed some known related methods.

[1]  Samuel Fernando,et al.  A Semantic Similarity Approach to Paraphrase Detection , 2008 .

[2]  Venkatesh Saligrama,et al.  Zero-Shot Learning via Semantic Similarity Embedding , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Ramiz M. Aliguliyev,et al.  A new sentence similarity measure and sentence based extractive technique for automatic text summarization , 2009, Expert Syst. Appl..

[4]  Vasile Rus,et al.  Measuring Semantic Similarity in Short Texts through Greedy Pairing and Word Semantics , 2012, FLAIRS Conference.

[5]  Jonathan Weese,et al.  UMBC_EBIQUITY-CORE: Semantic Textual Similarity Systems , 2013, *SEMEVAL.

[6]  Hang Li,et al.  Semantic Matching in Search , 2014, SMIR@SIGIR.

[7]  Valentin Jijkoun,et al.  Recognizing Textual Entailment Using Lexical Similarity , 2005 .

[8]  Eneko Agirre,et al.  SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation , 2017, *SEMEVAL.

[9]  Jane Hunter,et al.  UQeResearch: Semantic Textual Similarity Quantification , 2015, SemEval@NAACL-HLT.

[10]  Carlo Strapparava,et al.  Corpus-based and Knowledge-based Measures of Text Semantic Similarity , 2006, AAAI.

[11]  Susan T. Dumais,et al.  Similarity Measures for Short Segments of Text , 2007, ECIR.

[12]  Eneko Agirre,et al.  SemEval-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation , 2016, *SEMEVAL.

[13]  Jan Snajder,et al.  TakeLab: Systems for Measuring Semantic Text Similarity , 2012, *SEMEVAL.

[14]  Rafael Dueire Lins,et al.  A new sentence similarity assessment measure based on a three-layer sentence representation , 2014, DocEng '14.

[15]  Iryna Gurevych,et al.  UKP: Computing Semantic Textual Similarity by Combining Multiple Content Similarity Measures , 2012, *SEMEVAL.

[16]  M. de Rijke,et al.  Short Text Similarity with Word Embeddings , 2015, CIKM.

[17]  Claire Cardie,et al.  SemEval-2015 Task 2: Semantic Textual Similarity, English, Spanish and Pilot on Interpretability , 2015, *SEMEVAL.

[18]  Zuhair Bandar,et al.  Sentence similarity based on semantic nets and corpus statistics , 2006, IEEE Transactions on Knowledge and Data Engineering.

[19]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .