Word2Vec vs DBnary: Augmenting METEOR using Vector Representations or Lexical Resources?

This paper presents an approach combining lexico-semantic resources and distributed representations of words applied to the evaluation in machine translation (MT). This study is made through the enrichment of a well-known MT evaluation metric: METEOR. This metric enables an approximate match (synonymy or morphological similarity) between an automatic and a reference translation. Our experiments are made in the framework of the Metrics task of WMT 2014. We show that distributed representations are a good alternative to lexico-semantic resources for MT evaluation and they can even bring interesting additional information. The augmented versions of METEOR, using vector representations, are made available on our Github page.

[1]  Christopher D. Manning,et al.  Bilingual Word Representations with Monolingual Quality in Mind , 2015, VS@HLT-NAACL.

[2]  Christopher D. Manning,et al.  Bilingual Word Embeddings for Phrase-Based Machine Translation , 2013, EMNLP.

[3]  Josef van Genabith,et al.  Machine Translation Evaluation using Recurrent Neural Networks , 2015, WMT@EMNLP.

[4]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[5]  Marianna Apidianaki,et al.  METEOR-WSD: Improved Sense Matching in MT Evaluation , 2015, SSST@NAACL-HLT.

[6]  Alexander Panchenko Best of Both Worlds: Making Word Sense Embeddings Interpretable , 2016, LREC.

[7]  Hervé Blanchon,et al.  METEOR for multiple target languages using DBnary , 2015, MTSUMMIT.

[8]  Philipp Koehn,et al.  Findings of the 2014 Workshop on Statistical Machine Translation , 2014, WMT@ACL.

[9]  Philipp Cimiano,et al.  Linking Lexical Resources and Ontologies on the Semantic Web with Lemon , 2011, ESWC.

[10]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[11]  Jun-Ping Ng,et al.  Better Summarization Evaluation with Word Embeddings for ROUGE , 2015, EMNLP.

[12]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[13]  Alon Lavie,et al.  Extending the METEOR Machine Translation Evaluation Metric to the Phrase Level , 2010, NAACL.

[14]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[15]  Nitin Madnani,et al.  Fluency, Adequacy, or HTER? Exploring Different Human Judgments with a Tunable MT Metric , 2009, WMT@EACL.

[16]  Andrew Y. Ng,et al.  Improving Word Representations via Global Context and Multiple Word Prototypes , 2012, ACL.

[17]  Hinrich Schütze,et al.  AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes , 2015, ACL.

[18]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[19]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[20]  Mirella Lapata,et al.  Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics , 1999, ACL 1999.

[21]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[22]  Marco Baroni,et al.  Nouns are Vectors, Adjectives are Matrices: Representing Adjective-Noun Constructions in Semantic Space , 2010, EMNLP.

[23]  Olivier Pietquin,et al.  MultiVec: a Multilingual and Multilevel Representation Learning Toolkit for NLP , 2016, LREC.

[24]  Mihaela Vela,et al.  Predicting Machine Translation Adequacy with Document Embeddings , 2015, WMT@EMNLP.

[25]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[26]  Paul Buitelaar,et al.  LexInfo: A declarative model for the lexicon-ontology interface , 2011, J. Web Semant..

[27]  Jeremy J. Carroll,et al.  Resource description framework (rdf) concepts and abstract syntax , 2003 .

[28]  Alon Lavie,et al.  Meteor Universal: Language Specific Translation Evaluation for Any Target Language , 2014, WMT@ACL.

[29]  Haizhou Li,et al.  Adequacy–Fluency Metrics: Evaluating MT in the Continuous Space Model Framework , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[30]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[31]  Gilles Sérasset,et al.  DBnary: Wiktionary as a Lemon-based multilingual lexical resource in RDF , 2015, Semantic Web.

[32]  Barbara J. Grosz,et al.  Natural-Language Processing , 1982, Artificial Intelligence.

[33]  Christian Chiarcos,et al.  OLiA - Ontologies of Linguistic Annotation , 2015, Semantic Web.

[34]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[35]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[36]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[37]  Simone Paolo Ponzetto,et al.  BabelNet: Building a Very Large Multilingual Semantic Network , 2010, ACL.

[38]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[39]  Alon Lavie,et al.  METEOR-NEXT and the METEOR Paraphrase Tables: Improved Evaluation Support for Five Target Languages , 2010, WMT@ACL.

[40]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[41]  Roberto Navigli,et al.  Entity Linking meets Word Sense Disambiguation: a Unified Approach , 2014, TACL.

[42]  Ondrej Bojar,et al.  Results of the WMT14 Metrics Shared Task , 2013 .