An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Question Answering

Through using knowledge bases, question answering (QA) systems have come to be able to answer questions accurately over a variety of topics. However, knowledge bases are limited to only a few major languages, and thus it is often necessary to build QA systems that answer questions in one language based on an information source in another (cross-lingual QA: CLQA). Machine translation (MT) is one tool to achieve CLQA, and it is intuitively clear that a better MT system improves QA accuracy. However, it is not clear whether an MT system that is better for human consumption is also better for CLQA. In this paper, we investigate the relationship between manual and automatic translation evaluation metrics and CLQA accuracy by creating a data set using both manual and machine translations and perform CLQA using this created data set. 1 As a result, we find that QA accuracy is closely related with a metric that considers frequency of words, and as a result of manual analysis, we identify 3 factors of translation results that affect CLQA accuracy.

[1]  Stefan Riezler,et al.  Response-based Learning for Grounded Machine Translation , 2014, ACL.

[2]  Masao Utiyama,et al.  Evaluating effects of machine translation accuracy on cross-lingual patent retrieval , 2009, SIGIR.

[3]  Berthold Crysmann,et al.  Question answering from structured knowledge sources , 2007, J. Appl. Log..

[4]  Alexander Yates,et al.  Large-scale Semantic Parsing via Schema Matching and Lexicon Extension , 2013, ACL.

[5]  Andrew Chou,et al.  Semantic Parsing on Freebase from Question-Answer Pairs , 2013, EMNLP.

[6]  H. Ney,et al.  A novel string-to-string distance measure with applications to machine translation evaluation , 2003, MTSUMMIT.

[7]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[8]  Hsin-Hsi Chen,et al.  Overview of the NTCIR-6 Cross-Lingual Question Answering (CLQA) Task , 2007, NTCIR.

[9]  Jonathan Berant,et al.  Semantic Parsing via Paraphrasing , 2014, ACL.

[10]  Kevin Duh,et al.  Automatic Evaluation of Translation Quality for Distant Language Pairs , 2010, EMNLP.

[11]  Oren Etzioni,et al.  Open question answering over curated and extracted knowledge bases , 2014, KDD.

[12]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[13]  Takuya Matsuzaki,et al.  Evaluating Machine Translation Systems with Second Language Proficiency Tests , 2015, ACL.

[14]  Sadao Kurohashi,et al.  “Dialog Navigator”: A Question Answering System Based on Large Text Knowledge Base , 2002, COLING.

[15]  Graham Neubig,et al.  Travatar: A Forest-to-String Machine Translation Engine based on Tree Transducers , 2013, ACL.

[16]  Stefan Riezler,et al.  Response-based Learning for Machine Translation of Open-domain Database Queries , 2015, NAACL.

[17]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[18]  Tatsunori Mori,et al.  A Method of Cross Language Question-Answering Based on Machine Translation and Transliteration - Yokohama National University at NTCIR-5 CLQA1 , 2005, NTCIR.

[19]  Katunobu Itou,et al.  Bi-directional Cross Language Question Answering using a Single Monolingual QA System , 2005, NTCIR.

[20]  Eiichiro Sumita,et al.  Overview of the Patent Machine Translation Task at the NTCIR-10 Workshop , 2011, NTCIR.

[21]  Jacob Andreas,et al.  Semantic Parsing as Machine Translation , 2013, ACL.

[22]  Ondrej Bojar,et al.  Results of the WMT14 Metrics Shared Task , 2013 .

[23]  William Tunstall-Pedoe,et al.  True Knowledge: Open-Domain Question Answering Using Structured Knowledge and Inference , 2010, AI Mag..

[24]  Kimmo Kettunen,et al.  Choosing the Best MT Programs for CLIR Purposes - Can MT Metrics Be Helpful? , 2009, ECIR.

[25]  Maarten de Rijke,et al.  The Multiple Language Question Answering Track at CLEF 2003 , 2003, CLEF.

[26]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[27]  Chin-Yew Lin,et al.  ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation , 2004, COLING.