Assessing Query Translation Quality Using Back Translation in Hindi-English CLIR

Cross-Language Information Retrieval (CLIR) is a most demanding research area of Information Retrieval (IR) which deals with retrieval of documents different from query language. In CLIR, translation is an important activity for retrieving relevant results. Its goal is to translate query or document from one language into another language. The correct translation of the query is an essential task of CLIR because incorrect translation may affect the relevancy of retrieved results. The purpose of this paper is to compute the accuracy of query translation using the back translation for a HindiEnglish CLIR system. For experimental analysis, we used FIRE2011 dataset to select Hindi queries. Our analysis shows that back translation can be effective in improving the accuracy of query translation of the three translators used for analysis (i.e. Google, Microsoft and Babylon). Google is found best for the purpose.

[1]  Sanjeev Khudanpur,et al.  Transliteration of proper names in cross-language applications , 2003, SIGIR.

[2]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[3]  Roland Kuhn,et al.  PORT: a Precision-Order-Recall MT Evaluation Metric for Tuning , 2012, ACL.

[4]  Martin Braschler Combination Approaches for Multilingual Text Retrieval , 2004, Information Retrieval.

[5]  Tao Zhang,et al.  Research on English-Chinese Cross-Language Information Retrieval , 2007, 2007 International Conference on Machine Learning and Cybernetics.

[6]  Alon Lavie,et al.  Meteor 1.3: Automatic Metric for Reliable Optimization and Evaluation of Machine Translation Systems , 2011, WMT@EMNLP.

[7]  Iryna Gurevych,et al.  Combining Query Translation Techniques to Improve Cross-Language Information Retrieval , 2011, ECIR.

[8]  Carol Peters,et al.  Cross-Language Information Retrieval and Evaluation , 2001, Lecture Notes in Computer Science.

[9]  Antoni Oliver,et al.  Bilingual Newsgroups in Catalonia: A Challenge for Machine Translation , 2006, J. Comput. Mediat. Commun..

[10]  Roland Kuhn,et al.  AMBER: A Modified BLEU, Enhanced Ranking Metric , 2011, WMT@EMNLP.

[11]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[12]  Gregory Grefenstette,et al.  Cross-Language Information Retrieval , 1998, The Springer International Series on Information Retrieval.

[13]  Yiming Yang,et al.  Translingual Information Retrieval: A Comparative Evaluation , 1997, IJCAI.

[14]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[15]  Noriko Kando,et al.  Hybrid Approach of Query and Document Translation with Pivot Language for Cross-Language Information Retrieval , 2005, CLEF.

[16]  Mai Miyabe,et al.  Effects of Repair Support Agent for Accurate Multilingual Communication , 2008, PRICAI.

[17]  W. Bruce Croft,et al.  Phrasal translation and query expansion techniques for cross-language information retrieval , 1997, SIGIR '97.

[18]  Gerard Salton,et al.  Automatic Processing of Foreign Language Documents , 1969, COLING.

[19]  Lidia S. Chao,et al.  LEPOR: A Robust Evaluation Metric for Machine Translation with Augmented Factors , 2012, COLING.

[20]  M. S. V. S. Bhadri Raju,et al.  Translation approaches in Cross Language Information Retrieval , 2014, International Conference on Computing and Communication Technologies.

[21]  Hsin-Hsi Chen,et al.  Merging Mechanisms in Multilingual Information Retrieval , 2002, CLEF.

[22]  Trevor Cohn,et al.  Regression and Ranking based Optimisation for Sentence Level MT Evaluation , 2011, WMT@EMNLP.

[23]  I. Dan Melamed,et al.  Precision and Recall of Machine Translation , 2003, NAACL.

[24]  Hermann Ney,et al.  An Evaluation Tool for Machine Translation: Fast Evaluation for MT Research , 2000, LREC.

[25]  Jimmy J. Lin,et al.  Combining Statistical Translation Techniques for Cross-Language Information Retrieval , 2012, COLING.

[26]  Xiaodong Zeng,et al.  Language-independent Model for Machine Translation Evaluation with Reinforced Factors , 2013, MTSUMMIT.

[27]  Jinxi Xu,et al.  Evaluating a probabilistic model for cross-lingual information retrieval , 2001, SIGIR '01.

[28]  Douglas W. Oard,et al.  Evaluating resources for query translation in cross-language information retrieval , 1998 .

[29]  Bogdan Babych,et al.  Extending the BLEU MT Evaluation Method with Frequency Weightings , 2004, ACL.

[30]  Wei Gao,et al.  Joint Ranking for Multilingual Web Search , 2009, ECIR.

[31]  Milam Aiken Multilingual communication in electronic meetings , 2000, SIGG.

[32]  A. Jain,et al.  ANGLABHARTI: a multilingual machine aided translation project on translation from English to Indian languages , 1995, 1995 IEEE International Conference on Systems, Man and Cybernetics. Intelligent Systems for the 21st Century.

[33]  Reinhard Rapp The Backtranslation Score: Automatic MT Evalution at the Sentence Level without Reference Translations , 2009, ACL/IJCNLP.

[34]  Jianqiang Wang,et al.  Comparing User-assisted and Automatic Query Translation , 2002, CLEF.

[35]  Tetsuya Ishikawa,et al.  Japanese/English Cross-Language Information Retrieval: Exploration of Query Translation and Transliteration , 2001, Comput. Humanit..

[36]  David A. Hull Using Structured Queries for Disambiguation in Cross-Language Information Retrieval , 1997 .

[37]  Douglas W. Oard,et al.  Document Translation for Cross-Language Text Retrieval at the University of Maryland , 1997, TREC.

[38]  Mai Miyabe,et al.  Evaluation of the Validity of Back-Translation as a Method of Assessing the Accuracy of Machine Translation , 2015, 2015 International Conference on Culture and Computing (Culture Computing).

[39]  Kazuaki Kishida,et al.  Technical issues of cross-language information retrieval: a review , 2005, Inf. Process. Manag..

[40]  Chao Li,et al.  Multilingual Information Retrieval and Smart News Feed Based on Big Data , 2015, 2015 12th Web Information System and Application Conference (WISA).

[41]  E. Nida Toward a science of translating , 1964 .

[42]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[43]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[44]  Philipp Koehn,et al.  Findings of the 2014 Workshop on Statistical Machine Translation , 2014, WMT@ACL.

[45]  Pushpak Bhattacharyya,et al.  Compositional Machine Transliteration , 2010, TALIP.

[46]  Ricardo Campos,et al.  Survey of Temporal Information Retrieval and Related Applications , 2014, ACM Comput. Surv..

[47]  Noriko Kando,et al.  A Hybrid Approach to Query and Document Translation Using a Pivot Language for Cross-Language Information Retrieval , 2005, CLEF.

[48]  Soe Lai Phyue Development of Myanmar-English Bilingual WordNet like Lexicon , 2014 .

[49]  Yiyu Yao,et al.  Evaluating information retrieval system performance based on user preference , 2010, Journal of Intelligent Information Systems.

[50]  Manisha Sharma,et al.  Evaluation of machine translation , 2011, ICWET.

[51]  Roland Kuhn,et al.  Improving AMBER, an MT Evaluation Metric , 2012, WMT@NAACL-HLT.

[52]  Gregory Grefenstette,et al.  Querying across languages: a dictionary-based approach to multilingual information retrieval , 1996, SIGIR '96.

[53]  K. Saravanan,et al.  Improving Cross-Language Information Retrieval by Transliteration Mining and Generation , 2010, FIRE.

[54]  R. Brislin Back-Translation for Cross-Cultural Research , 1970 .

[55]  Dong Zhou,et al.  Translation techniques in cross-language information retrieval , 2012, CSUR.

[56]  Kevin Knight,et al.  Machine Transliteration , 1997, CL.

[57]  Monika Sharma,et al.  A Survey on Cross Language Information Retrieval , 2015 .