Crowdsourced Monolingual Translation

An enormous potential exists for solving certain classes of computational problems through rich collaboration among crowds of humans supported by computers. Solutions to these problems used to involve human professionals, who are expensive to hire or difficult to find. Despite significant advances, fully automatic systems still have much room for improvement. Recent research has involved recruiting large crowds of skilled humans (“crowdsourcing”), but crowdsourcing solutions are still restricted by the availability of those skilled human participants. With translation, for example, professional translators incur a high cost and are not always available; machine translation systems have been greatly improved recently but still can only provide passable translation; and crowdsourced translation is limited by the availability of bilingual humans. This article describes crowdsourced monolingual translation, where monolingual translation is translation performed by monolingual people. Crowdsourced monolingual translation is a collaborative form of translation performed by two crowds of people who speak the source or the target language, respectively, with machine translation as the mediating device. This article describes a general protocol to handle crowdsourced monolingual translation and analyzes three systems that implemented the protocol. These systems were studied in various settings and were found to supply significant improvement in quality over both machine translation and monolingual editing of machine translation output (“postediting”).

[1]  Omar F. Zaidan,et al.  Crowdsourcing Annotation for Machine Learning in Natural Language Processing Tasks (NON-FINAL VERSION! Proofread version will be uploaded April 30, 2012.) , 2012 .

[2]  Malte Gabsdil,et al.  Clarification in Spoken Dialogue Systems , 2003 .

[3]  Björn Hartmann,et al.  CommunitySourcing: engaging local crowds to perform expert work via physical kiosks , 2012, CHI.

[4]  K. Gegenfurtner,et al.  Design Issues in Gaze Guidance Under review with ACM Transactions on Computer Human Interaction , 2009 .

[5]  Hermann Ney,et al.  Error Analysis of Statistical Machine Translation Output , 2006, LREC.

[6]  Ignacio Garcia,et al.  Beyond translation memory : computers and the professional translator , 2009 .

[7]  Adrien Treuille,et al.  Predicting protein structures with a multiplayer online game , 2010, Nature.

[8]  Allison Druin,et al.  Evaluating a cross-cultural children's online book community: Lessons learned for sociability, usability, and cultural exchange , 2007, Interact. Comput..

[9]  Lydia B. Chilton,et al.  TurKit: Tools for iterative tasks on mechanical turk , 2009, 2009 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).

[10]  Chris Callison-Burch,et al.  Crowdsourcing Translation: Professional Quality from Non-Professionals , 2011, ACL.

[11]  Krzysztof Z. Gajos,et al.  Human computation tasks with global constraints , 2012, CHI.

[12]  Toru Ishida,et al.  Collaborative translation by monolinguals with machine translators , 2009, IUI.

[13]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[14]  Philipp Koehn,et al.  Enabling Monolingual Translators: Post-Editing vs. Options , 2010, NAACL.

[15]  William Lewis,et al.  Haitian Creole: How to Build and Ship an MT Engine from Scratch in 4 days, 17 hours, & 30 minutes , 2010, EAMT.

[16]  Stephen Hampshire,et al.  Translation and the Internet: Evaluating the Quality of Free Online Machine Translators , 2010 .

[17]  Lydia B. Chilton,et al.  Exploring iterative and parallel human computation processes , 2010, HCOMP '10.

[18]  Rob Miller,et al.  VizWiz: nearly real-time answers to visual questions , 2010, UIST.

[19]  Oren Etzioni,et al.  Lemmatic Machine Translation , 2009, MTSUMMIT.

[20]  Toru Ishida,et al.  Language grid: an infrastructure for intercultural collaboration , 2006, International Symposium on Applications and the Internet (SAINT'06).

[21]  Praveen Paritosh,et al.  The anatomy of a large-scale human computation engine , 2010, HCOMP '10.

[22]  J. Mixter Fast , 2012 .

[23]  Vladimir Eidelman,et al.  The Value of Monolingual Crowdsourcing in a Real-World Translation Scenario: Simulation using Haitian Creole Emergency SMS Messages , 2011, WMT@EMNLP.

[24]  Chris Callison-Burch Linear B System Description for the 2005 NIST MT Evaluation Exercise , 2005 .

[25]  Toru Ishida The Language Grid - Service-Oriented Collective Intelligence for Language Resource Interoperability , 2011, The Language Grid.

[26]  Aniket Kittur,et al.  Crowdsourcing, collaboration and creativity , 2010, XRDS.

[27]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[28]  Philip Resnik,et al.  Evaluating Translational Correspondence using Annotation Projection , 2002, ACL.

[29]  Susan C. Herring,et al.  The Multilingual Internet: Language, Culture, and Communication Online , 2007 .

[30]  Vladimir Eidelman,et al.  cdec: A Decoder, Alignment, and Learning Framework for Finite- State and Context-Free Translation Models , 2010, ACL.

[31]  Sergei Nirenburg,et al.  The Proper Place of Men and Machines in Language Translation , 2003 .

[32]  Yifan He,et al.  Bridging SMT and TM with Translation Recommendation , 2010, ACL.

[33]  Robert Munro Crowdsourced translation for emergency response in Haiti: the global collaboration of local knowledge , 2010, AMTA.

[34]  Benjamin B. Bederson,et al.  Readability of scanned books in digital libraries , 2008, CHI.

[35]  Benjamin B. Bederson,et al.  Appsheet : Efficient use of web workers to support decision making , 2011 .

[36]  Benjamin B. Bederson,et al.  Translation by iterative collaboration between monolingual users , 2010, HCOMP '10.

[37]  Andrei Popescu-Belis,et al.  A Hands-On Study of the Reliability and Coherence of Evaluation Metrics , 2002 .

[38]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[39]  Ted S. Sindlinger,et al.  Crowdsourcing: Why the Power of the Crowd is Driving the Future of Business , 2010 .

[40]  Laura A. Dabbish,et al.  Designing games with a purpose , 2008, CACM.

[41]  Toru Ishida,et al.  Collaborative Translation Protocols , 2011, The Language Grid.

[42]  Olivia Buzek,et al.  Error Driven Paraphrase Annotation using Mechanical Turk , 2010, Mturk@HLT-NAACL.

[43]  Oren Etzioni,et al.  Evaluating Lemmatic Communication , 2010 .

[44]  Manuel Blum,et al.  reCAPTCHA: Human-Based Character Recognition via Web Security Measures , 2008, Science.

[45]  MARTIN KAY The Proper Place of Men and Machines in Language Translation , 2004, Machine Translation.

[46]  Philipp Koehn,et al.  Findings of the 2011 Workshop on Statistical Machine Translation , 2011, WMT@EMNLP.

[47]  F. Lemmermeyer Error-correcting Codes , 2005 .

[48]  Edith Law,et al.  Input-agreement: a new mechanism for collecting data using human computation games , 2009, CHI.

[49]  Benjamin B. Bederson,et al.  Deploying monotrans widgets in the wild , 2012, CHI.

[50]  Miles Osborne,et al.  Statistical Machine Translation , 2010, Encyclopedia of Machine Learning and Data Mining.

[51]  Rada Mihalcea,et al.  Toward communicating simple sentences using pictorial representations , 2008, AMTA.

[52]  Xiaojin Zhu,et al.  A Text-to-Picture Synthesis System for Augmenting Communication , 2007, AAAI.

[53]  Jorge Díaz-Cintas,et al.  Fansubs: Audiovisual Translation in an Amateur Environment , 2006 .

[54]  Guy Lapalme,et al.  TransType2 - An Innovative Computer-Assisted Translation System , 2004, ACL.

[55]  David Yarowsky,et al.  Inducing Multilingual Text Analysis Tools via Robust Projection across Aligned Corpora , 2001, HLT.

[56]  Walter S. Lasecki,et al.  Online Sequence Alignment for Real-Time Audio Transcription by Non-Experts , 2012, AAAI.

[57]  Michael Vitale,et al.  The Wisdom of Crowds , 2015, Cell.

[58]  Michael S. Bernstein,et al.  Soylent: a word processor with a crowd inside , 2010, UIST.

[59]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[60]  J. F. Kelley,et al.  An empirical methodology for writing user-friendly natural language computer applications , 1983, CHI '83.

[61]  Claude E. Shannon,et al.  Prediction and Entropy of Printed English , 1951 .

[62]  Benjamin B. Bederson,et al.  MonoTrans2: a new human computation system to support monolingual translation , 2011, CHI.

[63]  Daniel G. Goldstein,et al.  Honesty in an Online Labor Market , 2011, Human Computation.

[64]  Magda Osman,et al.  Control Systems Engineering , 2010 .

[65]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[66]  Chris Callison-Burch,et al.  Cheap, Fast and Good Enough: Automatic Speech Recognition with Non-Expert Transcription , 2010, NAACL.

[67]  Douglas W. Oard,et al.  The surprise language exercises , 2003, TALIP.

[68]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.