Using targeted paraphrasing and monolingual crowdsourcing to improve translation

Targeted paraphrasing is a new approach to the problem of obtaining cost-effective, reasonable quality translation, which makes use of simple and inexpensive human computations by monolingual speakers in combination with machine translation. The key insight behind the process is that it is possible to spot likely translation errors with only monolingual knowledge of the target language, and it is possible to generate alternative ways to say the same thing (i.e., paraphrases) with only monolingual knowledge of the source language. Formal evaluation demonstrates that this approach can yield substantial improvements in translation quality, and the idea has been integrated into a broader framework for monolingual collaborative translation that produces fully accurate, fully fluent translations for a majority of sentences in a real-world translation task, with no involvement of human bilingual speakers.

[1]  Hermann Ney,et al.  Integration of Speech to Computer-Assisted Translation Using Finite-State Automata , 2006, ACL.

[2]  Carlos Alberto,et al.  TED: Ideas Worth Spreading , 2011 .

[3]  Jun Hu,et al.  Improving Arabic-Chinese Statistical Machine Translation using English as Pivot Language , 2009, WMT@EACL.

[4]  Chris Callison-Burch,et al.  Paraphrasing with Bilingual Parallel Corpora , 2005, ACL.

[5]  Sébastien Paquet,et al.  The Cross-Lingual Wiki Engine: enabling collaboration across language barriers , 2008, Int. Sym. Wikis.

[6]  Chris Callison-Burch,et al.  Syntactic Constraints on Paraphrases Extracted from Parallel Corpora , 2008, EMNLP.

[7]  Christopher J. Dyer,et al.  The “Noisier Channel”: Translation from Morphologically Complex Languages , 2007, WMT@ACL.

[8]  Hai Yang,et al.  ACM Transactions on Intelligent Systems and Technology - Special Section on Urban Computing , 2014 .

[9]  Chris Callison-Burch,et al.  Improving statistical translation through editing , 2004, EAMT.

[10]  Chris Callison-Burch,et al.  Improved Statistical Machine Translation Using Monolingually-Derived Paraphrases , 2009, EMNLP.

[11]  Smaranda Muresan,et al.  Generalizing Word Lattice Translation , 2008, ACL.

[12]  David Yarowsky,et al.  Inducing Multilingual Text Analysis Tools via Robust Projection across Aligned Corpora , 2001, HLT.

[13]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[14]  Benjamin B. Bederson,et al.  Human computation: a survey and taxonomy of a growing field , 2011, CHI.

[15]  Neal Grandgenett Ted: Ideas Worth Spreading , 2012 .

[16]  Chris Callison-Burch,et al.  Fast, Cheap, and Creative: Evaluating Translation Quality Using Amazon’s Mechanical Turk , 2009, EMNLP.

[17]  Philipp Koehn,et al.  Improved Statistical Machine Translation Using Paraphrases , 2006, NAACL.

[18]  Allison Druin,et al.  The International Children's Digital Library: viewing digital books online , 2003, Interact. Comput..

[19]  Anne-Marie Laurian,et al.  Machine Translation : What Type of Post-Editing on What Type of Documents for What Type of Users , 1984, ACL.

[20]  Radu Soricut,et al.  TrustRank: Inducing Trust in Automatic Translations via Ranking , 2010, ACL.

[21]  Gregory A. Sanders,et al.  The NIST 2008 Metrics for machine translation challenge—overview, methodology, metrics, and results , 2009, Machine Translation.

[22]  Olivia Buzek,et al.  Error Driven Paraphrase Annotation using Mechanical Turk , 2010, Mturk@HLT-NAACL.

[23]  Vladimir Eidelman,et al.  cdec: A Decoder, Alignment, and Learning Framework for Finite- State and Context-Free Translation Models , 2010, ACL.

[24]  Chris Callison-Burch,et al.  Creating Speech and Language Data With Amazon’s Mechanical Turk , 2010, Mturk@HLT-NAACL.

[25]  Hermann Ney,et al.  Error Analysis of Statistical Machine Translation Output , 2006, LREC.

[26]  Chris Callison-Burch,et al.  Crowdsourcing Translation: Professional Quality from Non-Professionals , 2011, ACL.

[27]  Toru Ishida,et al.  Designing Protocols for Collaborative Translation , 2009, PRIMA.

[28]  Benjamin B. Bederson,et al.  MonoTrans2: a new human computation system to support monolingual translation , 2011, CHI.

[29]  Aurélien Max,et al.  Sub-sentencial Paraphrasing by Contextual Pivot Translation , 2009, TextInfer@ACL.

[30]  Alexander M. Fraser,et al.  A Smorgasbord of Features for Statistical Machine Translation , 2004, NAACL.

[31]  Philipp Koehn,et al.  Findings of the 2010 Joint Workshop on Statistical Machine Translation and Metrics for Machine Translation , 2010, WMT@ACL.

[32]  Maria Leonor Pacheco,et al.  of the Association for Computational Linguistics: , 2001 .

[33]  G. Elisabeta Marai,et al.  Correcting Automatic Translations through Collaborations between MT and Monolingual Target-\-Lan\-gua\-ge Users , 2009, EACL.

[34]  Dafna Shahaf,et al.  Generalized Task Markets for Human and Machine Computation , 2010, AAAI.

[35]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[36]  Benjamin B. Bederson,et al.  Translation by iterative collaboration between monolingual users , 2010, HCOMP '10.

[37]  Olivia Buzek,et al.  Improving Translation via Targeted Paraphrasing , 2010, EMNLP.

[38]  Andrei Popescu-Belis,et al.  A Hands-On Study of the Reliability and Coherence of Evaluation Metrics , 2002 .

[39]  Lynne Bowker,et al.  Bilingual concordancers and translation memories: A comparative evaluation , 2004 .

[40]  Aurélien Max,et al.  Example-Based Paraphrasing for Improved Phrase-Based Statistical Machine Translation , 2010, EMNLP.

[41]  Lukas Biewald,et al.  Scalable crisis relief: Crowdsourced SMS translation and categorization with Mission 4636 , 2010, ACM DEV '10.

[42]  Guy Lapalme,et al.  TransType2 - An Innovative Computer-Assisted Translation System , 2004, ACL.

[43]  Dan Klein,et al.  Fast Exact Inference with a Factored Model for Natural Language Parsing , 2002, NIPS.

[44]  Andy Way,et al.  Facilitating Translation Using Source Language Paraphrase Lattices , 2010, EMNLP.

[45]  Philipp Koehn,et al.  A Web-Based Interactive Computer Aided Translation Tool , 2009, ACL.

[46]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[47]  Nitin Madnani,et al.  TER-Plus: paraphrase, semantic, and alignment enhancements to Translation Edit Rate , 2009, Machine Translation.

[48]  Philip Resnik,et al.  Bootstrapping parsers via syntactic projection across parallel texts , 2005, Natural Language Engineering.