An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Question Answering

Through using knowledge bases, question answering (QA) systems have come to be able to answer questions accurately over a variety of topics. However, knowledge bases are limited to only a few major languages, and thus it is often necessary to build QA systems that answer questions in one language based on an information source in another (cross-lingual QA: CLQA). Machine translation (MT) is one tool to achieve CLQA, and it is intuitively clear that a better MT system improves QA accuracy. However, it is not clear whether an MT system that is better for human consumption is also better for CLQA. In this paper, we investigate the relationship between manual and automatic translation evaluation metrics and CLQA accuracy by creating a data set using both manual and machine translations and perform CLQA using this created data set. 1 As a result, we find that QA accuracy is closely related with a metric that considers frequency of words, and as a result of manual analysis, we identify 3 factors of translation results that affect CLQA accuracy.

[1]  Berthold Crysmann,et al.  Question answering from structured knowledge sources , 2007, J. Appl. Log..

[2]  Kevin Duh,et al.  Automatic Evaluation of Translation Quality for Distant Language Pairs , 2010, EMNLP.

[3]  Jacob Andreas,et al.  Semantic Parsing as Machine Translation , 2013, ACL.

[4]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[5]  Eiichiro Sumita,et al.  Overview of the Patent Machine Translation Task at the NTCIR-10 Workshop , 2011, NTCIR.

[6]  Hsin-Hsi Chen,et al.  Overview of the NTCIR-6 Cross-Lingual Question Answering (CLQA) Task , 2007, NTCIR.

[7]  Maarten de Rijke,et al.  The Multiple Language Question Answering Track at CLEF 2003 , 2003, CLEF.

[8]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[9]  Ondrej Bojar,et al.  Results of the WMT14 Metrics Shared Task , 2013 .

[10]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[11]  Tatsunori Mori,et al.  A Method of Cross Language Question-Answering Based on Machine Translation and Transliteration - Yokohama National University at NTCIR-5 CLQA1 , 2005, NTCIR.

[12]  Alexander Yates,et al.  Large-scale Semantic Parsing via Schema Matching and Lexicon Extension , 2013, ACL.

[13]  Andrew Chou,et al.  Semantic Parsing on Freebase from Question-Answer Pairs , 2013, EMNLP.

[14]  Takuya Matsuzaki,et al.  Evaluating Machine Translation Systems with Second Language Proficiency Tests , 2015, ACL.

[15]  Graham Neubig,et al.  Travatar: A Forest-to-String Machine Translation Engine based on Tree Transducers , 2013, ACL.

[16]  Chin-Yew Lin,et al.  ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation , 2004, COLING.

[17]  Kimmo Kettunen,et al.  Choosing the Best MT Programs for CLIR Purposes - Can MT Metrics Be Helpful? , 2009, ECIR.

[18]  Jonathan Berant,et al.  Semantic Parsing via Paraphrasing , 2014, ACL.

[19]  Stefan Riezler,et al.  Response-based Learning for Grounded Machine Translation , 2014, ACL.

[20]  Masao Utiyama,et al.  Evaluating effects of machine translation accuracy on cross-lingual patent retrieval , 2009, SIGIR.

[21]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.