论文信息 - NeRoSim: A System for Measuring and Interpreting Semantic Textual Similarity

NeRoSim: A System for Measuring and Interpreting Semantic Textual Similarity

We present in this paper our system developed for SemEval 2015 Shared Task 2 (2a - English Semantic Textual Similarity, STS, and 2c - Interpretable Similarity) and the results of the submitted runs. For the English STS subtask, we used regression models combining a wide array of features including semantic similarity scores obtained from various methods. One of our runs achieved weighted mean correlation score of 0.784 for sentence similarity subtask (i.e., English STS) and was ranked tenth among 74 runs submitted by 29 teams. For the interpretable similarity pilot task, we employed a rule-based approach blended with chunk alignment labeling and scoring based on semantic similarity features. Our system for interpretable text similarity was among the top three best performing systems.

[1] Vasile Rus,et al. Experiments with Semantic Similarity Measures Based on LDA and LSA , 2013, SLSP.

[2] Mihai Surdeanu,et al. The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[3] Graeme Hirst,et al. Lexical chains as representations of context for the detection and correction of malapropisms , 1995 .

[4] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[5] Eneko Agirre,et al. SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity , 2012, *SEMEVAL.

[6] Ted Pedersen,et al. Extended Gloss Overlaps as a Measure of Semantic Relatedness , 2003, IJCAI.

[7] Ian H. Witten,et al. WEKA: a machine learning workbench , 1994, Proceedings of ANZIIS '94 - Australian New Zealnd Intelligent Information Systems Conference.

[8] Philip Resnik,et al. Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[9] Nobal B. Niraula,et al. The SIMILAR Corpus: A Resource To Foster The Qualitative Understanding of Semantic Similarity of Texts , 2012 .

[10] Christiane Fellbaum,et al. Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[11] Chris Quirk,et al. Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources , 2004, COLING.

[12] Chris Brockett,et al. Aligning the RTE 2006 Corpus , 2007 .

[13] Danielle S. McNamara,et al. Handbook of latent semantic analysis , 2007 .

[14] David W. Conrath,et al. Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[15] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[16] Vasile Rus,et al. Lemon and Tea Are Not Similar: Measuring Word-to-Word Similarity by Combining Different Methods , 2015, CICLing.

[17] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[18] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[19] Claire Cardie,et al. SemEval-2014 Task 10: Multilingual Semantic Textual Similarity , 2014, *SEMEVAL.