Lost and Found in Translation: Cross-Lingual Question Answering with Result Translation

Using cross-lingual question answering (CLQA), users can find information in languages that they do not know. In this thesis, we consider the broader problem of CLQA with result translation, where answers retrieved by a CLQA system must be translated back to the user's language by a machine translation (MT) system. This task is challenging because answers must be both relevant to the question and adequately translated in order to be correct. In this work, we show that integrating the MT closely with cross-lingual retrieval can improve result relevance and we further demonstrate that automatically correcting errors in the MT output can improve the adequacy of translated results. To understand the task better, we undertake detailed error analyses examining the impact of MT errors on CLQA with result translation. We identify which MT errors are most detrimental to the task and how different cross-lingual information retrieval (CLIR) systems respond to different kinds of MT errors. We describe two main types of CLQA errors caused by MT errors: lost in retrieval errors, where relevant results are not returned, and lost in translation errors, where relevant results are perceived irrelevant due to inadequate MT. To address the lost in retrieval errors, we introduce two novel models for cross-lingual information retrieval that combine complementary source-language and target-language information from MT. We show empirically that these hybrid, bilingual models outperform both monolingual models and a prior hybrid model. Even once relevant results are retrieved, if they are not translated adequately, users will not understand that they are relevant. Rather than improving a specific MT system, we take a more general approach that can be applied to the output of any MT system. Our adequacy-oriented automatic post-editors (APEs) use resources from the CLQA context and information from the MT system to automatically detect and correct phrase-level errors in MT at query time, focusing on the errors that are most likely to impact CLQA: deleted or missing content words and mistranslated named entities. Human evaluations show that these adequacy-oriented APEs can successfully adapt task-agnostic MT systems to the needs of the CLQA task. Since there is no existing test data for translingual QA or IR tasks, we create a translingual information retrieval (TLIR) evaluation corpus. Furthermore, we develop an analysis framework for isolating the impact of MT errors on CLIR and on result understanding, as well as evaluating the whole TLIR task. We use the TLIR corpus to carry out a task-embedded MT evaluation, which shows that our CLIR models address lost in retrieval errors, resulting in higher TLIR recall; and that the APEs successfully correct many lost in translation errors, leading to more adequately translated results.

[1]  Jimmy J. Lin,et al.  Overview of the TREC 2006 ciQA task , 2007, SIGF.

[2]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[3]  Alexander M. Fraser,et al.  TREC 2001 Cross-lingual Retrieval at BBN , 2001, TREC.

[4]  Jianqiang Wang,et al.  User-assisted query translation for interactive cross-language information retrieval , 2008, Inf. Process. Manag..

[5]  Salim Roukos,et al.  Direct Translation Model 2 , 2007, HLT-NAACL.

[6]  Alon Lavie,et al.  Meteor 1.3: Automatic Metric for Reliable Optimization and Evaluation of Machine Translation Systems , 2011, WMT@EMNLP.

[7]  Fredric C. Gey,et al.  Combining Query Translation and Document Translation in Cross-Language Retrieval , 2003, CLEF.

[8]  Nizar Habash,et al.  Combination of Arabic Preprocessing Schemes for Statistical Machine Translation , 2006, ACL.

[9]  Douglas W. Oard,et al.  Probabilistic structured query methods , 2003, SIGIR.

[10]  Hang Li,et al.  Named entity recognition in query , 2009, SIGIR.

[11]  Douglas W. Oard,et al.  A comparative study of query and document translation for cross-language information retrieval , 1998, AMTA.

[12]  Alex Kulesza,et al.  A learning approach to improving sentence-level MT evaluation , 2004 .

[13]  Antonio Toral,et al.  Applying Wikipedia's Multilingual Knowledge to Cross-Lingual Question Answering , 2007, NLDB.

[14]  Nizar Habash,et al.  Introduction to Arabic Natural Language Processing , 2010, Introduction to Arabic Natural Language Processing.

[15]  Jakob Uszkoreit,et al.  Large Scale Parallel Document Mining for Machine Translation , 2010, COLING.

[16]  Philipp Koehn,et al.  Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) , 2007 .

[17]  Julio Gonzalo,et al.  The CLEF 2003 Interactive Track , 2003, CLEF.

[18]  Hermann Ney,et al.  The RWTH Arabic-to-English spoken language translation system , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[19]  Kevin Knight,et al.  Name Translation in Statistical Machine Translation - Learning When to Transliterate , 2008, ACL.

[20]  Chris Quirk,et al.  Training a Sentence-Level Machine Translation Confidence Measure , 2004, LREC.

[21]  Iadh Ounis,et al.  Proceedings of the 20th ACM international conference on Information and knowledge management , 2011, CIKM 2011.

[22]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments , 2007, WMT@ACL.

[23]  Ondrej Bojar,et al.  Addicter: What Is Wrong with My Translations? , 2011, Prague Bull. Math. Linguistics.

[24]  W. Bruce Croft,et al.  Indri : A language-model based search engine for complex queries ( extended version ) , 2005 .

[25]  Noriko Kando,et al.  Overview of the Patent Retrieval Task at the NTCIR-6 Workshop , 2007, NTCIR.

[26]  Andy Way,et al.  Labelled Dependencies in Machine Translation Evaluation , 2007, WMT@ACL.

[27]  Heng Ji,et al.  A study of using an out-of-box commercial MT system for query translation in CLIR , 2008, iNEWS '08.

[28]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[29]  Dilek Z. Hakkani-Tür,et al.  Information extraction to improve cross-lingual document retrieval , 2007 .

[30]  Chu-Ren Huang,et al.  22nd International Conference on Computational Linguistics , 2008 .

[31]  van Gerardus Noord,et al.  Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010) , 2010 .

[32]  Mirella Lapata,et al.  Proceedings of ACL-08: HLT , 2008 .

[33]  Philip Koehn,et al.  Statistical Machine Translation , 2010, EAMT.

[34]  D. W. Barron Machine Translation , 1968, Nature.

[35]  Ido Dagan,et al.  Synthesis Lectures on Human Language Technologies , 2009 .

[36]  Chris Callison-Burch,et al.  Creating Speech and Language Data With Amazon’s Mechanical Turk , 2010, Mturk@HLT-NAACL.

[37]  Kathleen McKeown,et al.  MT Error Detection for Cross-Lingual Question Answering , 2010, COLING.

[38]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[39]  Philipp Koehn,et al.  Findings of the 2012 Workshop on Statistical Machine Translation , 2012, WMT@NAACL-HLT.

[40]  Hermann Ney,et al.  A Comparison of Alignment Models for Statistical Machine Translation , 2000, COLING.

[41]  Marcello Federico,et al.  Match without a Referee: Evaluating MT Adequacy without Reference Translations , 2012, WMT@NAACL-HLT.

[42]  Nizar Habash,et al.  MADA + TOKAN : A Toolkit for Arabic Tokenization , Diacritization , Morphological Disambiguation , POS Tagging , Stemming and Lemmatization , 2009 .

[43]  Michel Simard,et al.  Statistical Phrase-Based Post-Editing , 2007, NAACL.

[44]  Marwan Awad,et al.  Evaluation of Machine Translation Errors in English and Iraqi Arabic , 2010, LREC.

[45]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[46]  Nitin Madnani,et al.  E-rating Machine Translation , 2011, WMT@EMNLP.

[47]  Heng Ji,et al.  Collaborative entity extraction and translation , 2007 .

[48]  Ondrej Dusek,et al.  DEPFIX: A System for Automatic Correction of Czech MT Outputs , 2012, WMT@NAACL-HLT.

[49]  Mari Ostendorf,et al.  Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1 , 2003 .

[50]  J. O. Entzinger,et al.  University of Twente , 2018, The Grants Register 2019.

[51]  Julio Gonzalo,et al.  Overview of the CLEF 2005 Interactive Track , 2005, CLEF.

[52]  Joaquin Miller,et al.  Columbus , 1910 .

[53]  Rebecca Hwa,et al.  A Re-examination of Machine Learning Approaches for Sentence-Level MT Evaluation , 2007, ACL.

[54]  Michael Gamon,et al.  Sentence-level MT evaluation without reference translations: beyond language modeling , 2005, EAMT.

[55]  Douglas W. Oard,et al.  Translation-Based Indexing for Cross-Language Retrieval , 2002, ECIR.

[56]  Wei-Yun Ma,et al.  Where’s the Verb? Correcting Machine Translation During Question Answering , 2009, ACL.

[57]  Lluís Màrquez i Villodre,et al.  A Smorgasbord of Features for Automatic MT Evaluation , 2008, WMT@ACL.

[58]  Sara Stymne,et al.  Using a Grammar Checker for Evaluation and Postprocessing of Statistical Machine Translation , 2010, LREC.

[59]  Jinxi Xu,et al.  TREC-9 Cross-lingual Retrieval at BBN , 2000, TREC.

[60]  Kevin Knight,et al.  Automated Postediting of Documents , 1994, AAAI.

[61]  Noah A. Smith,et al.  The Web as a Parallel Corpus , 2003, CL.

[62]  Mounia Lalmas,et al.  SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval , 2006 .

[63]  Michael Emonts,et al.  ILR-Based MT Comprehension Test with Multi-Level Questions , 2007, NAACL.

[64]  Cyril Goutte,et al.  Domain adaptation of MT systems through automatic post-editing , 2007, MTSUMMIT.

[65]  Changning Huang,et al.  Improving query translation for cross-language information retrieval using statistical models , 2001, SIGIR '01.

[66]  Kamel Smaïli,et al.  “This sentence is wrong.” Detecting errors in machine-translated sentences , 2011, Machine Translation.

[67]  Wessel Kraaij,et al.  Variations on language modeling for information retrieval , 2005, SIGF.

[68]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[69]  Radu Soricut,et al.  TrustRank: Inducing Trust in Automatic Translations via Ranking , 2010, ACL.

[70]  James Allan,et al.  Simultaneous multilingual search for translingual information retrieval , 2008, CIKM '08.

[71]  Nizar Habash,et al.  Four Techniques for Online Handling of Out-of-Vocabulary Words in Arabic-English Statistical Machine Translation , 2008, ACL.

[72]  Roland Kuhn,et al.  Tighter Integration of Rule-Based and Statistical MT in Serial System Combination , 2008, COLING.

[73]  Florence Reeder,et al.  Corpus-based comprehensive and diagnostic MT evaluation: initial Arabic, Chinese, French, and Spanish results , 2002 .

[74]  Kenneth Ward Church,et al.  Good applications for crummy machine translation , 1993, Machine Translation.

[75]  Hermann Ney,et al.  Word Error Rates: Decomposition over POS classes and Applications for Error Analysis , 2007, WMT@ACL.

[76]  Daqing He,et al.  Enhancing query translation with relevance feedback in translingual information retrieval , 2011, Inf. Process. Manag..

[77]  Lucia Specia,et al.  Predicting Machine Translation Adequacy , 2011, MTSUMMIT.

[78]  Daniel Marcu,et al.  Bayesian Query-Focused Summarization , 2006, ACL.

[79]  Douglas W. Oard The CLEF 2001 Interactive Track , 2001, CLEF.

[80]  Jian-Yun Nie,et al.  A Multilingual Approach to Multilingual Information Retrieval , 2002, CLEF.

[81]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[82]  Sebastian Riedel,et al.  The CoNLL 2007 Shared Task on Dependency Parsing , 2007, EMNLP.

[83]  Hermann Ney,et al.  The RWTH statistical machine translation system for the IWSLT 2006 evaluation , 2006, IWSLT.

[84]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[85]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[86]  Alon Lavie,et al.  Exploring Normalization Techniques for Human Judgments of Machine Translation Adequacy Collected Using Amazon Mechanical Turk , 2010, Mturk@HLT-NAACL.

[87]  James Allan,et al.  Information Retrieval Techniques for Templated Queries , 2007, RIAO.

[88]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[89]  P. D. Ozonia,et al.  Sydney , 1916 .

[90]  Kevin Knight Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics , 2005 .

[91]  Alan F. Smeaton,et al.  Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval , 2003, SIGIR 2003.

[92]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[93]  Jianqiang Wang,et al.  iCLEF 2001 at Maryland: Comparing Term-for-Term Gloss and MT , 2001, CLEF.

[94]  Nizar Habash REMOOV : A Tool for Online Handling of Out-of-Vocabulary Words in Machine Translation , 2009 .

[95]  Michael Gamon,et al.  A Machine Learning Approach to the Automatic Evaluation of Machine Translation , 2001, ACL.

[96]  George A. Miller,et al.  Some psychological methods for evaluating the quality of translations , 1956, Mech. Transl. Comput. Linguistics.

[97]  J. Scott McCarley Should we Translate the Documents or the Queries in Cross-language Information Retrieval? , 1999, ACL.

[98]  Eiichiro Sumita,et al.  Overview of the Patent Machine Translation Task at the NTCIR-10 Workshop , 2011, NTCIR.

[99]  Hermann Ney,et al.  Are Unaligned Words Important for Machine Translation? , 2009, EAMT.

[100]  Martin Kay Proceedings of the 18th conference on Computational linguistics - Volume 2 , 2000 .

[101]  Kathleen R. McKeown,et al.  Identifying similarities and differences across English and Arabic news , 2005 .

[102]  Hirokazu Suzuki,et al.  Automatic Post-Editing based on SMT and its selective application by Sentence-Level Automatic Quality Evaluation , 2011, MTSUMMIT.

[103]  Jennifer Doyon,et al.  Automated Machine Translation Improvement Through Post-Editing Techniques: Analyst and Translator Experiments , 2008, AMTA.

[104]  Bonnie Lynn Webber Proceedings of the 39th Annual Meeting on Association for Computational Linguistics , 2001 .

[105]  Jianqiang Wang,et al.  Combining bidirectional translation and synonymy for cross-language information retrieval , 2006, SIGIR.

[106]  Nizar Habash,et al.  Lost & Found in Translation: Impact of Machine Translated Results on Translingual Information Retrieval , 2012, AMTA.

[107]  Ricardo Baeza-Yates,et al.  Design and Implementation of Relevance Assessments Using Crowdsourcing , 2011, ECIR.

[108]  Alex Waibel,et al.  The CMU statistical machine translation system , 2003, MTSUMMIT.

[109]  W. Bruce Croft,et al.  Combining the language model and inference network approaches to retrieval , 2004, Inf. Process. Manag..

[110]  Ralph A. Szweda,et al.  Information processing management , 1972 .

[111]  Adequacy , 2010 .

[112]  Philipp Koehn,et al.  (Meta-) Evaluation of Machine Translation , 2007, WMT@ACL.