RTM-DCU: Referential Translation Machines for Semantic Similarity

We use referential translation machines (RTMs) for predicting the semantic similarity of text. RTMs are a computational model for identifying the translation acts between any two data sets with respect to interpretants selected in the same domain, which are effective when making monolingual and bilingual similarity judgments. RTMs judge the quality or the semantic similarity of text by using retrieved relevant training data as interpretants for reaching shared semantics. We derive features measuring the closeness of the test sentences to the training data via interpretants, the difficulty of translating them, and the presence of the acts of translation, which may ubiquitously be observed in communication. RTMs provide a language independent approach to all similarity tasks and achieve top performance when predicting monolingual cross-level semantic similarity (Task 3) and good results in semantic relatedness and entailment (Task 1) and multilingual semantic textual similarity (STS) (Task 10). RTMs remove the need to access any task or domain specific information or resource.

[1]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[2]  Roberto Navigli,et al.  SemEval-2014 Task 3: Cross-Level Semantic Similarity , 2014, *SEMEVAL.

[3]  Mitchell P. Marcus,et al.  OntoNotes: A Unified Relational Semantic Representation , 2007, International Conference on Semantic Computing (ICSC 2007).

[4]  Josef van Genabith,et al.  Predicting sentence translation quality using extrinsic and language independent features , 2013, Machine Translation.

[5]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[6]  WestonJason,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002 .

[7]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[8]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[9]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[10]  Josef van Genabith,et al.  CNGL: Grading Student Answers by Acts of Translation , 2013, *SEMEVAL.

[11]  Josef van Genabith,et al.  CNGL-CORE: Referential Translation Machines for Measuring Semantic Similarity , 2013, *SEM@NAACL-HLT.

[12]  Andy Way,et al.  Parallel FDA5 for Fast Deployment of Accurate Statistical Machine Translation Systems , 2014, WMT@ACL.

[13]  Aixia Guo,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2014 .

[14]  Claire Cardie,et al.  SemEval-2014 Task 10: Multilingual Semantic Textual Similarity , 2014, *SEMEVAL.

[15]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[16]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[17]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[18]  Deniz Yuret,et al.  Instance Selection for Machine Translation using Feature Decay Algorithms , 2011, WMT@EMNLP.

[19]  Deniz Yuret,et al.  Optimizing Instance Selection for Statistical Machine Translation with Feature Decay Algorithms , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[20]  M. Marelli,et al.  SemEval-2014 Task 1: Evaluation of Compositional Distributional Semantic Models on Full Sentences through Semantic Relatedness and Textual Entailment , 2014, *SEMEVAL.

[21]  Mehmet Ergun Biçici,et al.  The Regression Model of Machine Translation , 2012 .

[22]  José Guilherme Camargo de Souza,et al.  FBK-UEdin Participation to the WMT13 Quality Estimation Shared Task , 2013, WMT@ACL.

[23]  Andy Way,et al.  Referential Translation Machines for Predicting Translation Quality , 2014, WMT@ACL.

[24]  Philipp Koehn,et al.  Findings of the 2014 Workshop on Statistical Machine Translation , 2014, WMT@ACL.

[25]  Ergun Biçici Referential Translation Machines for Quality Estimation , 2013, WMT@ACL.

[26]  Deniz Yuret,et al.  RegMT System for Machine Translation, System Combination, and Evaluation , 2011, WMT@EMNLP.

[27]  Nello Cristianini,et al.  Estimating the Sentence-Level Quality of Machine Translation Systems , 2009, EAMT.

[28]  Marco Marelli,et al.  A SICK cure for the evaluation of compositional distributional semantic models , 2014, LREC.

[29]  Roger Levy,et al.  Tregex and Tsurgeon: tools for querying and manipulating tree data structures , 2006, LREC.

[30]  Ergun Biçici Consensus ontologies in socially interacting MultiAgent systems , 2008, Multiagent Grid Syst..

[31]  B. Schölkopf,et al.  Asymptotically Optimal Choice of ε-Loss for Support Vector Machines , 1998 .

[32]  Iryna Gurevych,et al.  UKP: Computing Semantic Textual Similarity by Combining Multiple Content Similarity Measures , 2012, *SEMEVAL.

[33]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[34]  Y. Seginer,et al.  Learning syntactic structure , 2007 .