SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation

Semantic Textual Similarity (STS) measures the meaning similarity of sentences. Applications include machine translation (MT), summarization, generation, question answering (QA), short answer grading, semantic search, dialog and conversational systems. The STS shared task is a venue for assessing the current state-of-the-art. The 2017 task focuses on multilingual and cross-lingual pairs with one sub-track exploring MT quality estimation (MTQE) data. The task obtained strong participation from 31 teams, with 17 participating in all language tracks. We summarize performance and review a selection of well performing methods. Analysis highlights common errors, providing insight into the limitations of existing models. To support ongoing work on semantic representations, the STS Benchmark is introduced as a new shared training and evaluation set carefully selected from the corpus of English STS shared task data (2012-2017).

[1]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[2]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[3]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[4]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[5]  Chris Quirk,et al.  Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources , 2004, COLING.

[6]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[7]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[8]  Alessandro Moschitti,et al.  Efficient Convolution Kernels for Dependency and Constituent Syntactic Trees , 2006, ECML.

[9]  Mitchell P. Marcus,et al.  OntoNotes: The 90% Solution , 2006, NAACL.

[10]  Ido Dagan,et al.  Recognizing textual entailment: Rational, evaluation and approaches , 2009, Natural Language Engineering.

[11]  Simone Paolo Ponzetto,et al.  BabelNet: Building a Very Large Multilingual Semantic Network , 2010, ACL.

[12]  Eneko Agirre,et al.  SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity , 2012, *SEMEVAL.

[13]  Iryna Gurevych,et al.  UKP: Computing Semantic Textual Similarity by Combining Multiple Content Similarity Measures , 2012, *SEMEVAL.

[14]  Alexander F. Gelbukh,et al.  Soft Cardinality: A Parameterized Similarity Function for Text Comparison , 2012, *SEMEVAL.

[15]  Jan Snajder,et al.  TakeLab: Systems for Measuring Semantic Text Similarity , 2012, *SEMEVAL.

[16]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[17]  Philipp Koehn,et al.  Findings of the 2013 Workshop on Statistical Machine Translation , 2013, WMT@ACL.

[18]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[19]  Chris Callison-Burch,et al.  PPDB: The Paraphrase Database , 2013, NAACL.

[20]  Larry P. Heck,et al.  Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.

[21]  Eneko Agirre,et al.  *SEM 2013 shared task: Semantic Textual Similarity , 2013, *SEMEVAL.

[22]  Claire Cardie,et al.  SemEval-2014 Task 10: Multilingual Semantic Textual Similarity , 2014, *SEMEVAL.

[23]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[24]  Peter Young,et al.  From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.

[25]  Philipp Koehn,et al.  Findings of the 2014 Workshop on Statistical Machine Translation , 2014, WMT@ACL.

[26]  Yelong Shen,et al.  A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval , 2014, CIKM.

[27]  Marco Marelli,et al.  A SICK cure for the evaluation of compositional distributional semantic models , 2014, LREC.

[28]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[29]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[30]  Dilek Z. Hakkani-Tür,et al.  Learning Bidirectional Intent Embeddings by Convolutional Deep Structured Semantic Models for Spoken Language Understanding , 2015 .

[31]  Angeliki Lazaridou,et al.  Jointly optimizing word representations for lexical and sentential tasks with the C-PHRASE model , 2015, ACL.

[32]  Claire Cardie,et al.  SemEval-2015 Task 2: Semantic Textual Similarity, English, Spanish and Pilot on Interpretability , 2015, *SEMEVAL.

[33]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[34]  Birk Diedenhofen,et al.  cocor: A Comprehensive Solution for the Statistical Comparison of Correlations , 2015, PloS one.

[35]  Jimmy J. Lin,et al.  Multi-Perspective Sentence Similarity Modeling with Convolutional Neural Networks , 2015, EMNLP.

[36]  Kevin Gimpel,et al.  From Paraphrase Database to Compositional Paraphrase Model and Back , 2015, Transactions of the Association for Computational Linguistics.

[37]  Chris Callison-Burch,et al.  SemEval-2015 Task 1: Paraphrase and Semantic Similarity in Twitter (PIT) , 2015, *SEMEVAL.

[38]  Matt Post,et al.  Joshua 6: A phrase-based and hierarchical statistical machine translation system , 2015, Prague Bull. Math. Linguistics.

[39]  Hal Daumé,et al.  Deep Unordered Composition Rivals Syntactic Methods for Text Classification , 2015, ACL.

[40]  Steven Bethard,et al.  DLS@CU: Sentence Similarity from Word Alignment and Semantic Vector Composition , 2015, *SEMEVAL.

[41]  Gilles Sérasset,et al.  DBnary: Wiktionary as a Lemon-based multilingual lexical resource in RDF , 2015, Semantic Web.

[42]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[43]  Jimmy J. Lin,et al.  UMD-TTIC-UW at SemEval-2016 Task 1: Attention-Based Multi-Perspective Convolutional Neural Networks for Textual Similarity Measurement , 2016, *SEMEVAL.

[44]  M. de Rijke,et al.  Siamese CBOW: Optimizing Word Embeddings for Sentence Representations , 2016, ACL.

[45]  Kevin Gimpel,et al.  Charagram: Embedding Words and Sentences via Character n-grams , 2016, EMNLP.

[46]  Zhen-Hua Ling,et al.  Enhancing and Combining Sequential and Tree LSTM for Natural Language Inference , 2016, ArXiv.

[47]  Iryna Gurevych,et al.  Task-Oriented Intrinsic Evaluation of Semantic Textual Similarity , 2016, COLING.

[48]  Olivier Pietquin,et al.  MultiVec: a Multilingual and Multilevel Representation Learning Toolkit for NLP , 2016, LREC.

[49]  Felix Hill,et al.  Learning Distributed Representations of Sentences from Unlabelled Data , 2016, NAACL.

[50]  Jimmy J. Lin,et al.  Pairwise Word Interaction Modeling with Deep Neural Networks for Semantic Similarity Measurement , 2016, NAACL.

[51]  Kevin Gimpel,et al.  Towards Universal Paraphrastic Sentence Embeddings , 2015, ICLR.

[52]  Eneko Agirre,et al.  SemEval-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation , 2016, *SEMEVAL.

[53]  Timothy Baldwin,et al.  An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation , 2016, Rep4NLP@ACL.

[54]  Marco Idiart,et al.  Matrix Factorization using Window Sampling and Negative Sampling for Improved Word Representations , 2016, ACL.

[55]  Aline Villavicencio,et al.  Enhancing the LexVec Distributed Word Representation Model Using Positional Contexts and External Memory , 2016, ArXiv.

[56]  Piotr Andruszkiewicz,et al.  Samsung Poland NLP Team at SemEval-2016 Task 1: Necessity for diversity; combining recursive autoencoders, WordNet and ensemble methods to measure semantic similarity. , 2016, *SEMEVAL.

[57]  Tomas Brychcin,et al.  UWB at SemEval-2016 Task 1: Semantic Textual Similarity using Lexical, Syntactic, and Semantic Information , 2016, *SEMEVAL.

[58]  Marco Marelli,et al.  SICK through the SemEval glasses. Lesson learned from the evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment , 2016, Language Resources and Evaluation.

[59]  Christian Biemann,et al.  STS-UHH at SemEval-2017 Task 1: Scoring Semantic Textual Similarity Using Supervised and Unsupervised Ensemble , 2017, SemEval@ACL.

[60]  Ergun Biçici RTM at SemEval-2017 Task 1: Referential Translation Machines for Predicting Semantic Similarity , 2017, SemEval@ACL.

[61]  Xiao Zhang,et al.  PurdueNLP at SemEval-2017 Task 1: Predicting Semantic Textual Similarity with Paraphrase and Event Embeddings , 2017, SemEval@ACL.

[62]  Reem Bahgat,et al.  FCICU at SemEval-2017 Task 1: Sense-Based Language Independent Semantic Textual Similarity Approach , 2017, SemEval@ACL.

[63]  Sanjeev Arora,et al.  A Simple but Tough-to-Beat Baseline for Sentence Embeddings , 2017, ICLR.

[64]  Hao Wu,et al.  BIT at SemEval-2017 Task 1: Using Semantic Information Space to Evaluate Semantic Textual Similarity , 2017, *SEMEVAL.

[65]  Didier Schwab,et al.  LIM-LIG at SemEval-2017 Task1: Enhancing the Semantic Similarity for Arabic Sentences with Vectors Weighting , 2017, SemEval@ACL.

[66]  Luísa Coheur,et al.  L2F/INESC-ID at SemEval-2017 Tasks 1 and 2: Lexical and semantic features in word and textual similarity , 2017, SemEval@ACL.

[67]  Wolfgang Menzel,et al.  SEF@UHH at SemEval-2017 Task 1: Unsupervised Knowledge-Free Semantic Textual Similarity via Paragraph Vector , 2017, *SEMEVAL.

[68]  Jinyong Cheng,et al.  QLUT at SemEval-2017 Task 1: Semantic Textual Similarity Based on Word Embeddings , 2017, SemEval@ACL.

[69]  John C. Henderson,et al.  MITRE at SemEval-2017 Task 1: Simple Semantic Similarity , 2017, SemEval@ACL.

[70]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[71]  Kevin Gimpel,et al.  Revisiting Recurrent Networks for Paraphrastic Sentence Embeddings , 2017, ACL.

[72]  Martyna Spiewak,et al.  OPI-JSA at SemEval-2017 Task 1: Application of Ensemble learning for computing semantic textual similarity , 2017, SemEval@ACL.

[73]  Johannes Bjerva,et al.  ResSim at SemEval-2017 Task 1: Multilingual Word Representations for Semantic Textual Similarity , 2017, SemEval@ACL.

[74]  Pramod Viswanath,et al.  Representing Sentences as Low-Rank Subspaces , 2017, ACL.

[75]  Wenjie Liu,et al.  ITNLP-AiKF at SemEval-2017 Task 1: Rich Features Based SVR for Semantic Textual Similarity Computing , 2017, SemEval@ACL.

[76]  Fabrice Muhlenbach,et al.  UdL at SemEval-2017 Task 1: Semantic Textual Similarity Estimation of English Sentence Pairs Using Regression Model over Pairwise Features , 2017, SemEval@ACL.

[77]  Iván V. Meza,et al.  LIPN-IIMAS at SemEval-2017 Task 1: Subword Embeddings, Attention Recurrent Neural Networks and Cross Word Alignment for Semantic Textual Similarity , 2017, SemEval@ACL.

[78]  Denis Peskov,et al.  UMDeep at SemEval-2017 Task 1: End-to-End Shared Weight LSTM Model for Semantic Textual Similarity , 2017, SemEval@ACL.

[79]  Yang Shao,et al.  HCTI at SemEval-2017 Task 1: Use convolutional neural network to evaluate Semantic Textual Similarity , 2017, SemEval@ACL.

[80]  Ernie Chang,et al.  Neobility at SemEval-2017 Task 1: An Attention-based Sentence Similarity Model , 2017, SemEval@ACL.

[81]  Man Lan,et al.  ECNU at SemEval-2017 Task 1: Leverage Kernel-based Traditional NLP features and Neural Networks to Build a Universal Model for Multilingual and Cross-lingual Semantic Textual Similarity , 2017, SemEval@ACL.

[82]  Alberto Barrón-Cedeño,et al.  Lump at SemEval-2017 Task 1: Towards an Interlingua Semantic Similarity , 2017, SemEval@ACL.

[83]  Vasile Rus,et al.  DT_Team at SemEval-2017 Task 1: Semantic Similarity Using Alignments, Sentence-Level Embeddings and Gaussian Mixture Model Output , 2017, SemEval@ACL.

[84]  Holger Schwenk,et al.  Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[85]  Laurent Besacier,et al.  CompiLIG at SemEval-2017 Task 1: Cross-Language Plagiarism Detection Methods for Semantic Textual Similarity , 2017, *SEMEVAL.

[86]  Ergun Biçici,et al.  Predicting Translation Performance with Referential Translation Machines , 2017, WMT.

[87]  Matteo Pagliardini,et al.  Unsupervised Learning of Sentence Embeddings Using Compositional n-Gram Features , 2017, NAACL.