SAARSHEFF at SemEval-2016 Task 1: Semantic Textual Similarity with Machine Translation Evaluation Metrics and (eXtreme) Boosted Tree Ensembles

This paper describes the SAARSHEFF systems that participated in the English Semantic Textual Similarity (STS) task in SemEval2016. We extend the work on using machine translation (MT) metrics in the STS task by automatically annotating the STS datasets with a variety of MT scores for each pair of text snippets in the STS datasets. We trained our systems using boosted tree ensembles and achieved competitive results that outperforms he median Pearson correlation scores from all participating systems.

[1]  Kaizhong Zhang,et al.  Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems , 1989, SIAM J. Comput..

[2]  Josef van Genabith,et al.  ReVal: A Simple and Effective Machine Translation Evaluation Metric Based on Recurrent Neural Networks , 2015, EMNLP.

[3]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[4]  Alberto Barrón-Cedeño,et al.  UPC-CORE: What Can Machine Translation Evaluation Metrics and Wikipedia Do for Estimating Semantic Textual Similarity? , 2013, *SEMEVAL.

[5]  Baobao Chang,et al.  SSMT: A Machine Translation Evaluation View To Paragraph-to-Sentence Semantic Similarity , 2014, SemEval@COLING.

[6]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[7]  Alon Lavie,et al.  Extending the METEOR Machine Translation Evaluation Metric to the Phrase Level , 2010, NAACL.

[8]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[9]  Ergun Biçici,et al.  RTM-DCU: Predicting Semantic Similarity with Referential Translation Machines , 2015, *SEMEVAL.

[10]  Chin-Yew Lin,et al.  Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics , 2004, ACL.

[11]  Josef van Genabith,et al.  CNGL-CORE: Referential Translation Machines for Measuring Semantic Similarity , 2013, *SEM@NAACL-HLT.

[12]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[13]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[14]  Dekai Wu,et al.  Fully Automatic Semantic MT Evaluation , 2012, WMT@NAACL-HLT.

[15]  Jan Snajder,et al.  TakeLab: Systems for Measuring Semantic Text Similarity , 2012, *SEMEVAL.

[16]  Alexandra Birch,et al.  LRscore for Evaluating Lexical and Reordering Quality in MT , 2010, WMT@ACL.

[17]  Claire Cardie,et al.  SemEval-2014 Task 10: Multilingual Semantic Textual Similarity , 2014, *SEMEVAL.

[18]  Daniel Gildea,et al.  Factorization of Synchronous Context-Free Grammars in Linear Time , 2007, SSST@HLT-NAACL.

[19]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[20]  Carlos Guestrin,et al.  XGBoost : Reliable Large-scale Tree Boosting System , 2015 .

[21]  Steven Parker BADGER : A New Machine Translation Metric , 2008 .

[22]  Eneko Agirre,et al.  *SEM 2013 shared task: Semantic Textual Similarity , 2013, *SEMEVAL.

[23]  Carlo Strapparava,et al.  Proceedings of the 5th International Workshop on Semantic Evaluation , 2010 .

[24]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[25]  Pascale Fung,et al.  HLTC-HKUST: A Neural Network Paraphrase Classifier using Translation Metrics, Semantic Roles and Lexical Similarity Features , 2015, *SEMEVAL.

[26]  Ding Liu,et al.  Syntactic Features for Evaluation of Machine Translation , 2005, IEEvaluation@ACL.

[27]  Andy Way,et al.  RTM-DCU: Referential Translation Machines for Semantic Similarity , 2014, SemEval@COLING.

[28]  Roberto Navigli,et al.  Cross level semantic similarity: an evaluation framework for universal measures of similarity , 2015, Lang. Resour. Evaluation.

[29]  Josef van Genabith,et al.  USAAR-SHEFFIELD: Semantic Textual Similarity with Deep Regression and Machine Translation Evaluation Metrics , 2015, SemEval@NAACL-HLT.

[30]  Eneko Agirre,et al.  SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity , 2012, *SEMEVAL.

[31]  Kevin Duh,et al.  Automatic Evaluation of Translation Quality for Distant Language Pairs , 2010, EMNLP.

[32]  Nizar Habash,et al.  S EPIA : Surface Span Extension to Syntactic Dependency Precision-based MT Evaluation , 2008 .

[33]  Meritxell Gonz IPA and STOUT: Leveraging Linguistic and Source-based Features for Machine Translation Evaluation , 2014 .

[34]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[35]  Stephen Wan,et al.  Using Dependency-Based Features to Take the ’Para-farce’ out of Paraphrase , 2006, ALTA.

[36]  Jeffrey Pennington,et al.  Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection , 2011, NIPS.

[37]  Lluís Màrquez i Villodre,et al.  Asiya: An Open Toolkit for Automatic Machine Translation (Meta-)Evaluation , 2010, Prague Bull. Math. Linguistics.

[38]  Khalil Sima ' an BEER: BEtter Evaluation as Ranking , 2014 .

[39]  Lucia Specia,et al.  UOW: Semantically Informed Text Similarity , 2012, SemEval@NAACL-HLT.

[40]  Steven Bethard,et al.  DLS@CU: Sentence Similarity from Word Alignment and Semantic Vector Composition , 2015, *SEMEVAL.

[41]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[42]  Roberto Navigli,et al.  SemEval-2014 Task 3: Cross-Level Semantic Similarity , 2014, *SEMEVAL.

[43]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[44]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.