TransDoop: A Map-Reduce based Crowdsourced Translation for Complex Domain

Large amount of parallel corpora is required for building Statistical Machine Translation (SMT) systems. We describe the TransDoop system for gathering translations to create parallel corpora from online crowd workforce who have familiarity with multiple languages but are not expert translators. Our system uses a Map-Reduce-like approach to translation crowdsourcing where sentence translation is decomposed into the following smaller tasks: (a) translation of constituent phrases of the sentence; (b) validation of quality of the phrase translations; and (c) composition of complete sentence translations from phrase translations. TransDoop incorporates quality control mechanisms and easy-to-use worker user interfaces designed to address issues with translation crowdsourcing. We have evaluated the crowd’s output using the METEOR metric. For a complex domain like judicial proceedings, the higher scores obtained by the map-reduce based approach compared to complete sentence translation establishes the efficacy of our work.

[1]  Aniket Kittur,et al.  CrowdForge: crowdsourcing complex work , 2011, UIST.

[2]  Chris Callison-Burch,et al.  Creating Speech and Language Data With Amazon’s Mechanical Turk , 2010, Mturk@HLT-NAACL.

[3]  Chris Callison-Burch,et al.  Crowdsourcing Translation: Professional Quality from Non-Professionals , 2011, ACL.

[4]  Pushpak Bhattacharyya,et al.  Experiences in Resource Generation for Machine Translation through Crowdsourcing , 2012, LREC.

[5]  Lydia B. Chilton,et al.  Exploring iterative and parallel human computation processes , 2010, HCOMP '10.

[6]  Stephan Vogel,et al.  A Semi-Supervised Word Alignment Algorithm with Partial Manual Alignments , 2010, WMT@ACL.

[7]  Matt Post,et al.  Constructing Parallel Corpora for Six Indian Languages via Crowdsourcing , 2012, WMT@NAACL-HLT.

[8]  Hitoshi Isahara,et al.  The Impact of Crowdsourcing Post-editing with the Collaborative Translation Framework , 2012, JapTAL.

[9]  Stephan Vogel,et al.  Can Crowds Build parallel corpora for Machine Translation Systems? , 2010, Mturk@HLT-NAACL.

[10]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[11]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[12]  Jaime G. Carbonell,et al.  Active Learning and Crowd-Sourcing for Machine Translation , 2010, LREC.

[13]  Chris Callison-Burch,et al.  Fast, Cheap, and Creative: Evaluating Translation Quality Using Amazon’s Mechanical Turk , 2009, EMNLP.