ISCAS_NLP at SemEval-2016 Task 1: Sentence Similarity Based on Support Vector Regression using Multiple Features

This paper describes our system developed for English Monolingual subtask (STS Core) of SemEval-2016 Task 1: “Semantic Textual Similarity: A Unified Framework for Semantic Processing and Evaluation”. We measure the similarity between two sentences using three different types of features, including word alignment-based similarity, sentence vector-based similarity and sentence constituent similarity. The best performance of our submitted runs is a mean 0.69996 Pearson correlation which outperforms the median score from all participating systems.

[1]  Lushan Han,et al.  Samsung: Align-and-Differentiate Approach to Semantic Textual Similarity , 2015, SemEval@NAACL-HLT.

[2]  Deborah Caine,et al.  Back to the Basics , 2021, Interceram - International Ceramic Review.

[3]  Steven Bethard,et al.  DLS@CU: Sentence Similarity from Word Alignment , 2014, *SEMEVAL.

[4]  Georgiana Dinu,et al.  Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors , 2014, ACL.

[5]  Chris Callison-Burch,et al.  PPDB: The Paraphrase Database , 2013, NAACL.

[6]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[7]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[9]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[10]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[11]  Steven Bethard,et al.  DLS@CU: Sentence Similarity from Word Alignment and Semantic Vector Composition , 2015, *SEMEVAL.

[12]  Steven Bethard,et al.  Back to Basics for Monolingual Alignment: Exploiting Word Similarity and Contextual Evidence , 2014, TACL.

[13]  Iryna Gurevych,et al.  UKP: Computing Semantic Textual Similarity by Combining Multiple Content Similarity Measures , 2012, *SEMEVAL.

[14]  Christian Hänig,et al.  ExB Themis: Extensive Feature Extraction from Word Alignments for Semantic Textual Similarity , 2015, *SEMEVAL.