论文信息 - Task Alternation in Parallel Sentence Retrieval for Twitter Translation - 字舞流文

Task Alternation in Parallel Sentence Retrieval for Twitter Translation

We present an approach to mine comparable data for parallel sentences using translation-based cross-lingual information retrieval (CLIR). By iteratively alternating between the tasks of retrieval and translation, an initial general-domain model is allowed to adapt to in-domain data. Adaptation is done by training the translation system on a few thousand sentences retrieved in the step before. Our setup is time- and memory-efficient and of similar quality as CLIR-based adaptation on millions of parallel sentences.

Stefan Riezler | Laura Jehl | Felix Hieber | S. Riezler | F. Hieber | Laura Jehl

[1] Dragos Stefan Munteanu,et al. Improving Machine Translation Performance by Exploiting Non-Parallel Corpora , 2005, CL.

[2] S. T. Buckland,et al. Computer-Intensive Methods for Testing Hypotheses. , 1990 .

[3] Jinxi Xu,et al. Evaluating a probabilistic model for cross-lingual information retrieval , 2001, SIGIR '01.

[4] Hermann Ney,et al. A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[5] Jimmy J. Lin,et al. Looking inside the box: context-sensitive translation for cross-language information retrieval , 2012, SIGIR '12.

[6] Stephen E. Robertson,et al. Okapi at TREC-7: Automatic Ad Hoc, Filtering, VLC and Interactive , 1998, TREC.

[7] Marcello Federico,et al. Domain Adaptation for Statistical Machine Translation with Monolingual Resources , 2009, WMT@EACL.

[8] David Chiang,et al. Hierarchical Phrase-Based Translation , 2007, CL.

[9] Philipp Koehn,et al. Experiments in Domain Adaptation for Statistical Machine Translation , 2007, WMT@ACL.

[10] Jimmy J. Lin,et al. Combining Statistical Translation Techniques for Cross-Language Information Retrieval , 2012, COLING.

[11] Stefan Riezler,et al. Twitter Translation using Translation-Based Cross-Lingual Retrieval , 2012, WMT@NAACL-HLT.

[12] Vladimir Eidelman,et al. cdec: A Decoder, Alignment, and Learning Framework for Finite- State and Context-Free Translation Models , 2010, ACL.

[13] Stefan Riezler,et al. On Some Pitfalls in Automatic Evaluation and Significance Testing for MT , 2005, IEEvaluation@ACL.

[14] K. J. Evans,et al. Computer Intensive Methods for Testing Hypotheses: An Introduction , 1990 .

[15] Dragos Stefan Munteanu,et al. Extracting Parallel Sub-Sentential Fragments from Non-Parallel Corpora , 2006, ACL.

[16] Jakob Uszkoreit,et al. Large Scale Parallel Document Mining for Machine Translation , 2010, COLING.

[17] Anne Lohrli. Chapman and Hall , 1985 .

[18] Holger Schwenk,et al. On the Use of Comparable Corpora to Improve SMT performance , 2009, EACL.

[19] Robert L. Mercer,et al. The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[20] Timothy Baldwin,et al. langid.py: An Off-the-shelf Language Identification Tool , 2012, ACL.

[21] Chin-Yew Lin,et al. ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation , 2004, COLING.

[22] Steven Abney,et al. Semisupervised Learning for Computational Linguistics , 2007 .

[23] Jimmy J. Lin,et al. Why Not Grab a Free Lunch? Mining Large Corpora for Parallel Sentences to Improve Translation Modeling , 2012, NAACL.

[24] Richard M. Schwartz,et al. Language and Translation Model Adaptation using Comparable Corpora , 2008, EMNLP.